Maps in R: choropleth maps

January 23, 2013

(This article was first published on Milano R net, and kindly contributed to R-bloggers)

This is the third article of the Maps in R series. After having shown how to draw a map without placing data on it and how to plot point data on a map, in this installment the creation of a choropleth map will be presented.

A choropleth map is a thematic map featuring regions colored or shaded according to the value assumed by the variable of interest in that particular region.

Packages used

We make use of three packages (plus their dependencies) to produce the maps in this post.

  • maptools: a set of tools for reading and handling spatial objects. In particular, it will be used to read .shp files, a common format used by geographic information systems software.
  • ggplot2: one of the powerful graphics engines available to R users
  • ggmap: a package for spatial visualization using popular on-line mapping systems, such as GoogleMaps or OpenStreetMap.

So, let’s load them into R.

?View Code RSPLUS
> library(maptools)
> library(ggplot2)
> library(ggmap)

Data used

In order to produce the maps displayed later in this post, we used data publicly available from Eurostat.

The polygons for drawing the administrative boundaries were obtained from this link. In particular, the NUTS 2010 shapefile in the 1:60 million scale was downloaded and used. The other available scales would allow the drawing of better defined maps, but at a computational cost. The zipped file has to be extracted in a folder of choice for using it later.

Note: for an introduction on NUTS (Nomenclature of territorial units for statistics) classification, look at this Introduction on Eurostat website.

The statistic that will be drawn in the map is the the public expenditure on education as a percentage of the Gross Domestic Product (year 2009), and is one of the many indicators available for download at Eurostat’s statistics portal.

In order to make the following code fully reproducible, I’m uploading below the .csv file I obtained after performing the various selections on the Eurostat portal.

csv file: educ_thexp_1_Data

With the following commands we read both the shapefile and the csv in R.

?View Code RSPLUS
# read administrative boundaries (change folder appropriately)
> eurMap eurEdueurEdu$Value

Data preparation

The eurMap object could potentially be used as is to produce a map. In fact, a command as simple as plot(eurMap) would draw the map of Europe at the NUTS-3 level.

However, since we are going to make use of the great versatility of the ggplot2 graphics facility, we need to transform eurMap to a data frame, as that is the only type of object accepted by the ggplot() function. The ggplot2 package has the fortify() function that takes care of said conversion.

?View Code RSPLUS
# convert shp data to data frame for ggplot
> eurMapDf

After that, we can add the data on education expenditure to the newly created data frame. Note that, since administrative borders will be plotted as paths, it’s important to have them ordered as they have to be drawn.

?View Code RSPLUS
# merge map and data
> eurEduMapDfeurEduMapDf

Finally, to avoid the plotting of European territories far away from mainland Europe, we put some coordinates limits to the data, using the geocode() function we introduced in previous posts.

?View Code RSPLUS
# limit data to main Europe
> europe.limits  eurEduMapDf  min(europe.limits$lon) & long < max(europe.limits$lon) & lat > min(europe.limits$lat) & lat < max(europe.limits$lat))

Mapping with ggplot2

While not specifically designed for the purpose, the ggplot2 package is well suited for the display of maps. Let’s first create an empty map (just showing borders,) by adding a geom_path layer to a ggplot() call.

?View Code RSPLUS
# ggplot mapping
# data layer
> m0m1

The map in the following image is obtained by typing m1 in the R console.

Then, by adding a geom_polygon layer, we color the country area according to the education expenditure value (stored in the Value column.)

?View Code RSPLUS
# fill with education expenditure data
> m2

And here is the result of typing m2 in the R console.

The country borders are not visible anymore because the filled polygons have been stacked on top of them. Changing the order of layers will produce a more readable map.

?View Code RSPLUS
# inverse order (to have visible borders)
> m0  m1  m2  m2

Overlay on GoogleMaps

Polygon and path layers can be overlaid on a map obtained from GoogleMaps (or OpenStreetMaps) using the ggmap package introduced in the previous posts.

It must be noted that the overlaying will work only if the boundary files have the same coordinate system and the same projection of the underlying map. In this case, the combination of Eurostats shapefiles and GoogleMaps works fine.

?View Code RSPLUS
# over a GoogleMap (not working if not correctly projected)
> map m0 m1m2

As a final touch, let’s put text on the map. Using the summaryBy() funcion of the doBy package, we calculate average values for multiple variables. The solution of using the average coordinates of all boundary points as the position where to place the text is a lazy one, as appropriate ways to calculate the center of mass of a polygon should be used instead.

?View Code RSPLUS
# add text
> library(doBy)
> txtValm3

Where to find Shapefiles

If you are looking for shapefiles, your best choice is probably googling for them. However, here are a few places where you can find them.

  • The Eurostat page on Administrative and statistical units mentioned at the beginning of this post.
  • ISTAT’s cartography page for Italian administrative units. Also contains geocoded data points of ports, police stations and many other things.
  • The Global Administrative Areas website has worldwide boundaries at various levels in several formats.
  • CShapes features historical boundaries going back to 1946, thus useful for displaying older data. They can be downloaded as shapefiles, but chsapes also comes as an R package.

View (and download) the full code:

?Download maps3.R
# read administrative boundaries (change folder appropriately)
eurMap <- readShapePoly(fn="NUTS_2010_60M_SH/Shape/data/NUTS_RG_60M_2010")
# read downloaded data (change folder appropriately)
eurEdu <- read.csv("educ_thexp_1_Data.csv", stringsAsFactors = F)
eurEdu$Value <- as.double(eurEdu$Value) #format as numeric
# merge map and data
eurEduMapDf <- merge(eurMapDf, eurEdu, by.x="id", by.y="GEO")
eurEduMapDf <- eurEduMapDf[order(eurEduMapDf$order),]
#limit data to main Europe
europe.limits <- geocode(c("Cape Fligely, Rudolf Island, Franz Josef Land, Russia", "Gavdos, Greece", "Faja Grande, Azores", "Severny Island, Novaya Zemlya, Russia"))
eurEduMapDf <- subset(eurEduMapDf, long > min(europe.limits$lon) & long < max(europe.limits$lon) & lat > min(europe.limits$lat) & lat < max(europe.limits$lat))
# ggplot mapping
# data layer
m0 <- ggplot(data=eurEduMapDf)
# empty map (only borders)
m1 <- m0 + geom_path(aes(x=long, y=lat, group=group), color='gray') + coord_equal()
# fill with education expenditure data
m2 <- m1 + geom_polygon(aes(x=long, y=lat, group=group, fill=Value))
# inverse order (to have visible borders)
m0 <- ggplot(data=eurEduMapDf)
m1 <- m0 + geom_polygon(aes(x=long, y=lat, group=group, fill=Value)) + coord_equal()
m2 <- m1 + geom_path(aes(x=long, y=lat, group=group), color='black')
# over a GoogleMap (not working if not correctly projected)
map <- get_map(location = 'Europe', zoom=4)
m0 <- ggmap(map)
m1 <- m0 + geom_polygon(aes(x=long, y=lat, group=group, fill=Value), data=eurEduMapDf, alpha=.9)
m2 <- m1 + geom_path(aes(x=long, y=lat, group=group), data=eurEduMapDf, color='black')
# add text
txtVal <- summaryBy(long + lat + Value ~ id, data=eurEduMapDf, FUN=mean, keep.names=T)
m3 <- m2 + geom_text(aes(x=long, y=lat, label=Value), data=txtVal, col="yellow", cex=3)

To leave a comment for the author, please follow the link and comment on their blog: Milano R net. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Mango solutions

RStudio homepage

Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training


CRC R books series

Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)