By Rich Pugh, Commercial Director
I was recently at LondonR, where I saw James Cheshire discussing the ways in which he created stunning maps for his book, London: The Information Capital, using R. If you want a glimpse of James’ presentation, have a look at his website: spatial.ly.
I’ve not drawn maps for a while in R, so the code on James’ website gave me a great start, but I needed something to plot.
At the same time as exploring this, I was busy writing my presentation on “Defining and Creating a Data Scientist with R” for the forthcoming EARL (Effective Applications of the R Language) conference in London in September (www.earl-conference.com).
So, I thought I’d have a go at plotting the geographic locations of last year’s EARL attendees (then at some point over the coming months I could maybe redo this work with this year’s attendees and look for any overall geographical changes).
The first thing I did was grabbed the data – in this case, the list of EARL 2014 attendees. Sadly, all I had was the city/town each attendee was from, like this:
> getLocations <- read.csv("locations.csv") > getLocations$Where[1:10]  London Copenhagen London Ludwigshafen  London Munich London Boston  London Warwick 53 Levels: Aabenraa Amsterdam Barcelona Basel ... Zurich
For a second, I thought I would have to go and hand-code all the geographical locations, but then I found a brilliant function called “geocode” in the “ggmaps” package.
It uses the google maps API to grab geographical locations (long, lat) on any place in the world, like this:
> geocode(c("London", "Chippenham", "Chicago", "Shanghai", + "Tower Hotel London")) lon lat 1 -0.1277583 51.50735 2 -2.1195157 51.46151 3 -87.6297982 41.87811 4 121.4737010 31.23042 5 -0.0738890 51.50667
Using geocode means I could quickly get the positions (longitude, latitude) for each city, and aggregate to find the total number of attendees from those locations:
> # Aggregate the data > aggLocations <- aggregate(list(Num = getLocations$Where), + list(Location = getLocations$Where), length) > # Merge on geographic locations > geoLocs <- geocode(as.character(aggLocations$Location)) > geoDf <- cbind(aggLocations, geoLocs) > head(geoDf) Location Num lon lat 1 Aabenraa 1 9.415159 55.04085 2 Amsterdam 1 4.895168 52.37022 3 Barcelona 2 2.173404 41.38506 4 Basel 5 7.597551 47.56744 5 Belfast 1 -5.930120 54.59729 6 Berlin 5 13.404954 52.52001
Now I can create my first plot:
> theWorld <- borders("world", colour="grey80", fill="grey80") > ggplot(data = geoDf, aes(x = lon, y = lat)) + + theWorld + theme_minimal() + + geom_point(aes(size = Num), color="red", alpha = .7) + + theme(axis.ticks = element_blank(), axis.text = element_blank(), axis.title = element_blank()) + + coord_cartesian(ylim=c(30, 72), xlim = c(-130, 30)) + + scale_size_continuous(range = c(3, 6)) + + guides(size=FALSE)
Not bad as a first attempt, and really easy with ggplot – note the use of coord_cartesian to nicely zoom in on a particular window of the plot.
Straight away, we can see that most attendees were from Europe with a 12 from North America and 1 from Israel (hi Tal!).
Let’s exclude these points and re-plot for Europe only, this using text instead of markers:
> ggplot(data = subset(geoDf, lon > -50), aes(x = lon, y = lat)) + + theWorld + theme_minimal() + + geom_text(aes(label = Location, size = Num)) + + theme(axis.ticks = element_blank(), axis.text = element_blank(), axis.title = element_blank()) + + coord_cartesian(ylim=c(37, 72), xlim = c(-12, 25)) + + scale_size_continuous(range = c(3, 6))
Using geom_text to add these labels is such an easy change, and it quickly gives a nice overview of the places from which attendees came.
London is, of course, the most popular place from which attendees came to the conference. This is unsurprising given that:
- The conference is in London
- It was organised by the people involved in LondonR
In fact, almost half the attendees were from the London area. However, it is nice to see the range of places in mainland Europe from which people attended.
My last plot was inspired by the “Journeys to Work” plot from James’ book (http://spatial.ly/2015/03/mapping-flows/). I wanted to look at the journeys taken by EARL 2014 attendees.
Using geom_segment (as suggested by James’ code) I created the following:
> tH <- geocode("Tower Hotel London") > ggplot(data = geoDf, aes(x = lon, y = lat)) + + theWorld + theme_minimal() + + geom_segment(aes(xend = tH$lon, yend = tH$lat, alpha = Num)) + + theme(axis.ticks = element_blank(), axis.text = element_blank(), axis.title = element_blank()) + + coord_cartesian(ylim=c(30, 72), xlim = c(-130, 40)) + + scale_alpha_continuous(range = c(.2, .9))
Finally, I’ll calculate the total amount of miles travelled by EARL 2014 attendees using the sp library:
> library(sp) > kms <- spDistsN1(as.matrix(geoDf[3:4]), + unlist(towerHotel), longlat=TRUE) > sum(geoDf$Num * kms * 0.62)  94587.85
Almost 100,000 miles travelled in total!
Well, that was my quick foray into the world of ggmap. I was particularly impressed with the geocode function in ggmaps that made the grabbing of geographic locations so easy, and how simple it was to create smart map plots using ggplot2.
This year the EARL conference will be in London in September and Boston in November – I hope to see as many of you there as possible, and not just because it’ll mean a bigger sample size for me next time I analyse this data!