‘Space is to place as eternity is to time.’
Greetings, humanists, social and data scientists!
In the realm of data science, the ability to visualize geospatial data is paramount. This is particularly true when working with historical data analysis. Maps provide a visual representation of spatial data that allows viewers to discern patterns and relationships that might not be immediately apparent in tabular data. R, with its rich ecosystem of packages and libraries, offers versatile tools for geospatial data visualization.
In this lesson, we will continue our journey exploring 19th century France. Using the
Guerry package we’ll be exploring how to plot maps in R. Please, check out the lesson Use R to explore the link between literacy and suicide in 1830s France to learn how to use regression analysis to study the relationship between literacy and suicides in 19th century France.
After I wrote the lesson Use R to explore the link between literacy and suicide in 1830s France, the author of the
HistData package, Michael Friendly, kindly let me know that the
Guerry dataset has its own package that includes not only the data provided in
HistData but also additional historical maps of France. Please, check the documentation of the package here.
Coding the past: historical data analysis with maps
1. Getting Started with Maps in R
Before immersing in our geospatial journey, ensure you’ve equipped your R environment with the
Guerry package and that you load it.
Once you have loaded the package, the
gfrance object will be available in your environment. If you check the class of this object with
class(gfrance), you will get
But what is a
SpatialPolygonsDataFrame? For a detailed explanation, check Michael T. Hallworth. Alternatively, a succinct explanation is provided below.
gfrance combines the Guerry dataframe that we explored in the last lesson with spatial information of France and its departments in 1830.
For a simple initiation into maps in R, trace the contours of France by plotting the
gfrance data. This is as simple as using
plot(gfrance). The result, as you’ll see below, is a distinct outline of the various departments of France as they existed in 1830. It is a perfect canvas for deeper geospatial data visualization.
We could work directly with the
gfrance object, but in order to use
ggplot2, we will first convert it to
sf, which stands for simple feature. Simple feature is a standard for representing real world objects in computers. To learn more about it, check this article about the sf package, written by its author, Edzer Prebesma. To make the conversion, we will use the
st_as_sf function from the
Note that the conversion to sf added a variable in the data frame called
geometry. This variable contains the spatial information of each department.
3. geom_sf - give color and meaning to your maps
In the lesson ‘Use R to explore the link between literacy and suicide in 1830s France’ we used regression to study the relationship between literacy and the incidence of suicides. Furthermore, we also included in our model variables regarding the wealth of the department and its distance to Paris. We found that the distance to Paris is negatively associated with the incidence of suicides, that is, the farther from Paris, the lower the incidence of suicides.
Do you think this relationship can be seen in a map?
To find out, we will plot a map of France and color the departments according to their suicide rate. The first step is to create a new variable expressing suicides per 100,000 inhabitants. Remember that, in the Guerry dataset, suicide is expressed as the population divided by the number of suicides. We can calculate the inverse of Suicides and multiply it by 100,000 to obtain suicides per 100,000 inhabitants.
In the code below, we load
ggplot2 and create the variable
Suicides_Pop, as described above. To plot the map of France colored according to the suicides per inhabitants, we use a layer called
geom_sf. This function takes the data and maps the filling color of the map to the variable
Suicides_Pop. It also sets two constant attributes: the color and size of the department border lines. With scale_fill_gradient we tell ggplot that we would like the fill to be a gradient in which high numbers are associated with a darker red and low numbers with a lighter red. Finally, we set some theme configurations.
Indeed, as the regression analysis indicated, the departments closer to Paris have a higher incidence of suicides.
4. Adding a second geom_sf to highlight Paris
In the previous map, we can see that the departments closer to Paris have a higher incidence of suicides. However, it is not clear where Paris is located. To highlight Paris, we will add a second layer of
geom_sf to the map. This time, we will use the
filter function to select only the department of Paris (Seine, code 75). We will also set the fill color to white. Finally we will add the following items to make the plot more informative:
ggtitleis used to add a title and a subtitle to the plot;
theme_bware used to customize the appearance of the plot:
text = element_text(color = 'white')sets the color of the text to white;
axis.tickremove axis ticks and text;
panel.grid.minorremove grid lines;
legend.backgroundset the background color to match the color of this blog;
legend.textsets the color of the legend text to white.
Apart from some outliers, the surroundings of Paris present indeed higher rates of suicides compared to the rest of the country.
Do you think that indeed the geographical location has an impact on suicides? Or maybe a third variable is confounding this relationship? In the last lesson, we saw that even when we controlled for wealth and literacy rates, the associates persisted. There are other variables that might play a role in this relationship. Feel free to further investigate and share your thoughts in the comments below.
For more information on maps, please check these materials:
- Eric Weinberg, “Using Geospatial Data to Inform Historical Research in R” Programming Historian 7 (2018), https://doi.org/10.46430/phen0075.
- Moraga, Paula. (2023). Spatial Statistics for Data Science: Theory and Practice with R. Chapman & Hall/CRC.
- st_as_sf is used to convert a SpatialPolygonsDataFrame to sf;
- geom_sf is used to plot sf objects in ggplot2;
- maps can be a powerful tool to visualize relationships that involve space;