This is the third article of the Maps in R series. After having shown how to draw a map without placing data on it and how to plot point data on a map, in this installment the creation of a choropleth map will be presented.
A choropleth map is a thematic map featuring regions colored or shaded according to the value assumed by the variable of interest in that particular region.
We make use of three packages (plus their dependencies) to produce the maps in this post.
maptools: a set of tools for reading and handling spatial objects. In particular, it will be used to read .shp files, a common format used by geographic information systems software.
ggplot2: one of the powerful graphics engines available to R users
ggmap: a package for spatial visualization using popular on-line mapping systems, such as GoogleMaps or OpenStreetMap.
So, let's load them into R.
In order to produce the maps displayed later in this post, we used data publicly available from Eurostat.
The polygons for drawing the administrative boundaries were obtained from this link. In particular, the NUTS 2010 shapefile in the 1:60 million scale was downloaded and used. The other available scales would allow the drawing of better defined maps, but at a computational cost. The zipped file has to be extracted in a folder of choice for using it later.
Note: for an introduction on NUTS (Nomenclature of territorial units for statistics) classification, look at this Introduction on Eurostat website.
The statistic that will be drawn in the map is the the public expenditure on education as a percentage of the Gross Domestic Product (year 2009), and is one of the many indicators available for download at Eurostat's statistics portal.
In order to make the following code fully reproducible, I'm uploading below the .csv file I obtained after performing the various selections on the Eurostat portal.
csv file: educ_thexp_1_Data
With the following commands we read both the shapefile and the csv in R.
# read administrative boundaries (change folder appropriately) > eurMap eurEdueurEdu$Value
eurMap object could potentially be used as is to produce a map. In fact, a command as simple as
plot(eurMap) would draw the map of Europe at the NUTS-3 level.
However, since we are going to make use of the great versatility of the
ggplot2 graphics facility, we need to transform
eurMap to a data frame, as that is the only type of object accepted by the
ggplot() function. The
ggplot2 package has the
fortify() function that takes care of said conversion.
# convert shp data to data frame for ggplot > eurMapDf
After that, we can add the data on education expenditure to the newly created data frame. Note that, since administrative borders will be plotted as paths, it's important to have them ordered as they have to be drawn.
# merge map and data > eurEduMapDfeurEduMapDf
Finally, to avoid the plotting of European territories far away from mainland Europe, we put some coordinates limits to the data, using the
geocode() function we introduced in previous posts.
Mapping with ggplot2
While not specifically designed for the purpose, the
ggplot2 package is well suited for the display of maps. Let's first create an empty map (just showing borders,) by adding a
geom_path layer to a
# ggplot mapping # data layer > m0m1
The map in the following image is obtained by typing
m1 in the R console.
Then, by adding a
geom_polygon layer, we color the country area according to the education expenditure value (stored in the
# fill with education expenditure data > m2
And here is the result of typing
m2 in the R console.
The country borders are not visible anymore because the filled polygons have been stacked on top of them. Changing the order of layers will produce a more readable map.
# inverse order (to have visible borders) > m0 m1 m2 m2
Overlay on GoogleMaps
Polygon and path layers can be overlaid on a map obtained from GoogleMaps (or OpenStreetMaps) using the
ggmap package introduced in the previous posts.
It must be noted that the overlaying will work only if the boundary files have the same coordinate system and the same projection of the underlying map. In this case, the combination of Eurostats shapefiles and GoogleMaps works fine.
# over a GoogleMap (not working if not correctly projected) > map m0 m1m2
As a final touch, let's put text on the map. Using the
summaryBy() funcion of the
doBy package, we calculate average values for multiple variables. The solution of using the average coordinates of all boundary points as the position where to place the text is a lazy one, as appropriate ways to calculate the center of mass of a polygon should be used instead.
# add text > library(doBy) > txtValm3
Where to find Shapefiles
If you are looking for shapefiles, your best choice is probably googling for them. However, here are a few places where you can find them.
- The Eurostat page on Administrative and statistical units mentioned at the beginning of this post.
- ISTAT's cartography page for Italian administrative units. Also contains geocoded data points of ports, police stations and many other things.
- The Global Administrative Areas website has worldwide boundaries at various levels in several formats.
- CShapes features historical boundaries going back to 1946, thus useful for displaying older data. They can be downloaded as shapefiles, but
chsapesalso comes as an R package.
View (and download) the full code:
library(maptools) library(ggplot2) library(ggmap) # read administrative boundaries (change folder appropriately) eurMap <- readShapePoly(fn="NUTS_2010_60M_SH/Shape/data/NUTS_RG_60M_2010") # read downloaded data (change folder appropriately) eurEdu <- read.csv("educ_thexp_1_Data.csv", stringsAsFactors = F) eurEdu$Value <- as.double(eurEdu$Value) #format as numeric # merge map and data eurEduMapDf <- merge(eurMapDf, eurEdu, by.x="id", by.y="GEO") eurEduMapDf <- eurEduMapDf[order(eurEduMapDf$order),] #limit data to main Europe europe.limits <- geocode(c("Cape Fligely, Rudolf Island, Franz Josef Land, Russia", "Gavdos, Greece", "Faja Grande, Azores", "Severny Island, Novaya Zemlya, Russia")) eurEduMapDf <- subset(eurEduMapDf, long > min(europe.limits$lon) & long < max(europe.limits$lon) & lat > min(europe.limits$lat) & lat < max(europe.limits$lat)) # ggplot mapping # data layer m0 <- ggplot(data=eurEduMapDf) # empty map (only borders) m1 <- m0 + geom_path(aes(x=long, y=lat, group=group), color='gray') + coord_equal() # fill with education expenditure data m2 <- m1 + geom_polygon(aes(x=long, y=lat, group=group, fill=Value)) # inverse order (to have visible borders) m0 <- ggplot(data=eurEduMapDf) m1 <- m0 + geom_polygon(aes(x=long, y=lat, group=group, fill=Value)) + coord_equal() m2 <- m1 + geom_path(aes(x=long, y=lat, group=group), color='black') m2 # over a GoogleMap (not working if not correctly projected) map <- get_map(location = 'Europe', zoom=4) m0 <- ggmap(map) m1 <- m0 + geom_polygon(aes(x=long, y=lat, group=group, fill=Value), data=eurEduMapDf, alpha=.9) m2 <- m1 + geom_path(aes(x=long, y=lat, group=group), data=eurEduMapDf, color='black') # add text library(doBy) txtVal <- summaryBy(long + lat + Value ~ id, data=eurEduMapDf, FUN=mean, keep.names=T) m3 <- m2 + geom_text(aes(x=long, y=lat, label=Value), data=txtVal, col="yellow", cex=3)