Mapping the Past – Geospatial Visualization in R

[This article was first published on coding-the-past, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.


Introduction


‘Space is to place as eternity is to time.’

Joseph Joubert


Greetings, humanists, social and data scientists!


In the realm of data science, the ability to visualize geospatial data is paramount. This is particularly true when working with historical data analysis. Maps provide a visual representation of spatial data that allows viewers to discern patterns and relationships that might not be immediately apparent in tabular data. R, with its rich ecosystem of packages and libraries, offers versatile tools for geospatial data visualization.


In this lesson, we will continue our journey exploring 19th century France. Using the Guerry package we’ll be exploring how to plot maps in R. Please, check out the lesson Use R to explore the link between literacy and suicide in 1830s France to learn how to use regression analysis to study the relationship between literacy and suicides in 19th century France.




Data source

After I wrote the lesson Use R to explore the link between literacy and suicide in 1830s France, the author of the HistData package, Michael Friendly, kindly let me know that the Guerry dataset has its own package that includes not only the data provided in HistData but also additional historical maps of France. Please, check the documentation of the package here.




Coding the past: historical data analysis with maps


1. Getting Started with Maps in R

Before immersing in our geospatial journey, ensure you’ve equipped your R environment with the Guerry package and that you load it.


content_copy Copy

install.packages("Guerry")
library(Guerry)


Once you have loaded the package, the gfrance object will be available in your environment. If you check the class of this object with class(gfrance), you will get SpatialPolygonsDataFrame.


But what is a SpatialPolygonsDataFrame? For a detailed explanation, check Michael T. Hallworth. Alternatively, a succinct explanation is provided below.


tips_and_updates  
A SpatialPolygonsDataFrame integrates a simple dataframe with spatial data, utilizing a list structure.


Simply put, gfrance combines the Guerry dataframe that we explored in the last lesson with spatial information of France and its departments in 1830.


For a simple initiation into maps in R, trace the contours of France by plotting the gfrance data. This is as simple as using plot(gfrance). The result, as you’ll see below, is a distinct outline of the various departments of France as they existed in 1830. It is a perfect canvas for deeper geospatial data visualization.


Map of France in 1830


2. st_as_sf

We could work directly with the gfrance object, but in order to use ggplot2, we will first convert it to sf, which stands for simple feature. Simple feature is a standard for representing real world objects in computers. To learn more about it, check this article about the sf package, written by its author, Edzer Prebesma. To make the conversion, we will use the st_as_sf function from the sf package.


content_copy Copy

library(sf)
gfrance_sf <- st_as_sf(gfrance)


Note that the conversion to sf added a variable in the data frame called geometry. This variable contains the spatial information of each department.




3. geom_sf - give color and meaning to your maps

In the lesson ‘Use R to explore the link between literacy and suicide in 1830s France’ we used regression to study the relationship between literacy and the incidence of suicides. Furthermore, we also included in our model variables regarding the wealth of the department and its distance to Paris. We found that the distance to Paris is negatively associated with the incidence of suicides, that is, the farther from Paris, the lower the incidence of suicides.


Do you think this relationship can be seen in a map?


To find out, we will plot a map of France and color the departments according to their suicide rate. The first step is to create a new variable expressing suicides per 100,000 inhabitants. Remember that, in the Guerry dataset, suicide is expressed as the population divided by the number of suicides. We can calculate the inverse of Suicides and multiply it by 100,000 to obtain suicides per 100,000 inhabitants.


In the code below, we load ggplot2 and create the variable Suicides_Pop, as described above. To plot the map of France colored according to the suicides per inhabitants, we use a layer called geom_sf. This function takes the data and maps the filling color of the map to the variable Suicides_Pop. It also sets two constant attributes: the color and size of the department border lines. With scale_fill_gradient we tell ggplot that we would like the fill to be a gradient in which high numbers are associated with a darker red and low numbers with a lighter red. Finally, we set some theme configurations.


content_copy Copy

library(ggplot2)

gfrance_sf$Suicides_Pop <- (1/gfrance_sf$Suicides)*100000

ggplot()+ 
  geom_sf(data = gfrance_sf, aes(fill = Suicides_Pop), color = "black", size = 0.3)+
  scale_fill_gradient(name = "", low = '#FF6885', high ='#67001f')+
  theme_bw()+
  theme(
    axis.text = element_blank(),
    axis.ticks = element_blank(),
  )


map plot with geom_sf and colored with scale_fil_gradient


Indeed, as the regression analysis indicated, the departments closer to Paris have a higher incidence of suicides.




4. Adding a second geom_sf to highlight Paris

In the previous map, we can see that the departments closer to Paris have a higher incidence of suicides. However, it is not clear where Paris is located. To highlight Paris, we will add a second layer of geom_sf to the map. This time, we will use the filter function to select only the department of Paris (Seine, code 75). We will also set the fill color to white. Finally we will add the following items to make the plot more informative:

  • ggtitle is used to add a title and a subtitle to the plot;
  • theme and theme_bw are used to customize the appearance of the plot:
    • text = element_text(color = 'white') sets the color of the text to white;
    • axis.textand axis.tick remove axis ticks and text;
    • panel.grid.major and panel.grid.minor remove grid lines;
    • panel.background and plot.background and legend.background set the background color to match the color of this blog;
    • legend.text sets the color of the legend text to white.


content_copy Copy

library(dplyr)

seine <- filter(gfrance_sf, dept == 75)

ggplot()+ 
  geom_sf(data = gfrance_sf, aes(fill = Suicides_Pop), color = "black", size = 0.3)+
  geom_sf(data = seine,  fill = "white", color = "black")+
  scale_fill_gradient(name = "", low = '#FF6885', high ='#67001f')+
  ggtitle("Suicide incidence per 100,000 people", subtitle = "Seine (Paris) highlighted in white")+
  theme_bw()+
  theme(text = element_text(color = 'white'),
        axis.text = element_blank(),
        axis.ticks = element_blank(),
        panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(),
        panel.background = element_rect(fill = '#2E3031'),
        plot.background = element_rect(fill = '#2E3031'),
        legend.background = element_rect(fill="#2E3031"),
        legend.text =  element_text(colour = "white"))


map with a second geom_sf layer


Apart from some outliers, the surroundings of Paris present indeed higher rates of suicides compared to the rest of the country.


Do you think that indeed the geographical location has an impact on suicides? Or maybe a third variable is confounding this relationship? In the last lesson, we saw that even when we controlled for wealth and literacy rates, the associates persisted. There are other variables that might play a role in this relationship. Feel free to further investigate and share your thoughts in the comments below.


For more information on maps, please check these materials:





Conclusions


  • st_as_sf is used to convert a SpatialPolygonsDataFrame to sf;
  • geom_sf is used to plot sf objects in ggplot2;
  • maps can be a powerful tool to visualize relationships that involve space;



To leave a comment for the author, please follow the link and comment on their blog: coding-the-past.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)