Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Introduction

‘Space is to place as eternity is to time.’

Joseph Joubert

Greetings, humanists, social and data scientists!

In the realm of data science, the ability to visualize geospatial data is paramount. This is particularly true when working with historical data analysis. Maps provide a visual representation of spatial data that allows viewers to discern patterns and relationships that might not be immediately apparent in tabular data. R, with its rich ecosystem of packages and libraries, offers versatile tools for geospatial data visualization.

In this lesson, we will continue our journey exploring 19th century France. Using the `Guerry` package we’ll be exploring how to plot maps in R. Please, check out the lesson Use R to explore the link between literacy and suicide in 1830s France to learn how to use regression analysis to study the relationship between literacy and suicides in 19th century France.

Data source

After I wrote the lesson Use R to explore the link between literacy and suicide in 1830s France, the author of the `HistData` package, Michael Friendly, kindly let me know that the `Guerry` dataset has its own package that includes not only the data provided in `HistData` but also additional historical maps of France. Please, check the documentation of the package here.

Coding the past: historical data analysis with maps

1. Getting Started with Maps in R

Before immersing in our geospatial journey, ensure you’ve equipped your R environment with the `Guerry` package and that you load it.

content_copy Copy

Once you have loaded the package, the `gfrance` object will be available in your environment. If you check the class of this object with `class(gfrance)`, you will get `SpatialPolygonsDataFrame`.

But what is a `SpatialPolygonsDataFrame`? For a detailed explanation, check Michael T. Hallworth. Alternatively, a succinct explanation is provided below.

A SpatialPolygonsDataFrame integrates a simple dataframe with spatial data, utilizing a list structure.

Simply put, `gfrance` combines the Guerry dataframe that we explored in the last lesson with spatial information of France and its departments in 1830.

For a simple initiation into maps in R, trace the contours of France by plotting the `gfrance` data. This is as simple as using `plot(gfrance)`. The result, as you’ll see below, is a distinct outline of the various departments of France as they existed in 1830. It is a perfect canvas for deeper geospatial data visualization.

2. st_as_sf

We could work directly with the `gfrance` object, but in order to use `ggplot2`, we will first convert it to `sf`, which stands for simple feature. Simple feature is a standard for representing real world objects in computers. To learn more about it, check this article about the sf package, written by its author, Edzer Prebesma. To make the conversion, we will use the `st_as_sf` function from the `sf` package.

content_copy Copy

Note that the conversion to sf added a variable in the data frame called `geometry`. This variable contains the spatial information of each department.

3. geom_sf - give color and meaning to your maps

In the lesson ‘Use R to explore the link between literacy and suicide in 1830s France’ we used regression to study the relationship between literacy and the incidence of suicides. Furthermore, we also included in our model variables regarding the wealth of the department and its distance to Paris. We found that the distance to Paris is negatively associated with the incidence of suicides, that is, the farther from Paris, the lower the incidence of suicides.

Do you think this relationship can be seen in a map?

To find out, we will plot a map of France and color the departments according to their suicide rate. The first step is to create a new variable expressing suicides per 100,000 inhabitants. Remember that, in the Guerry dataset, suicide is expressed as the population divided by the number of suicides. We can calculate the inverse of Suicides and multiply it by 100,000 to obtain suicides per 100,000 inhabitants.

In the code below, we load `ggplot2` and create the variable `Suicides_Pop`, as described above. To plot the map of France colored according to the suicides per inhabitants, we use a layer called `geom_sf`. This function takes the data and maps the filling color of the map to the variable `Suicides_Pop`. It also sets two constant attributes: the color and size of the department border lines. With scale_fill_gradient we tell ggplot that we would like the fill to be a gradient in which high numbers are associated with a darker red and low numbers with a lighter red. Finally, we set some theme configurations.

content_copy Copy

Indeed, as the regression analysis indicated, the departments closer to Paris have a higher incidence of suicides.

4. Adding a second geom_sf to highlight Paris

In the previous map, we can see that the departments closer to Paris have a higher incidence of suicides. However, it is not clear where Paris is located. To highlight Paris, we will add a second layer of `geom_sf` to the map. This time, we will use the `filter` function to select only the department of Paris (Seine, code 75). We will also set the fill color to white. Finally we will add the following items to make the plot more informative:

• `ggtitle` is used to add a title and a subtitle to the plot;
• `theme` and `theme_bw` are used to customize the appearance of the plot:
• `text = element_text(color = 'white')` sets the color of the text to white;
• `axis.text`and `axis.tick` remove axis ticks and text;
• `panel.grid.major` and `panel.grid.minor` remove grid lines;
• `panel.background` and `plot.background` and `legend.background` set the background color to match the color of this blog;
• `legend.text` sets the color of the legend text to white.

content_copy Copy

Apart from some outliers, the surroundings of Paris present indeed higher rates of suicides compared to the rest of the country.

Do you think that indeed the geographical location has an impact on suicides? Or maybe a third variable is confounding this relationship? In the last lesson, we saw that even when we controlled for wealth and literacy rates, the associates persisted. There are other variables that might play a role in this relationship. Feel free to further investigate and share your thoughts in the comments below.

Conclusions

• st_as_sf is used to convert a SpatialPolygonsDataFrame to sf;
• geom_sf is used to plot sf objects in ggplot2;
• maps can be a powerful tool to visualize relationships that involve space;