All Paths Lead to the EARL Conference with ggplot2 & ggmap

May 11, 2015
By

(This article was first published on Mango Solutions, and kindly contributed to R-bloggers)

By Rich Pugh, Commercial Director

I was recently at LondonR, where I saw James Cheshire discussing the ways in which he created stunning maps for his book, London:  The Information Capital, using R.  If you want a glimpse of James’ presentation, have a look at his website: spatial.ly.

I’ve not drawn maps for a while in R, so the code on James’ website gave me a great start, but I needed something to plot.

At the same time as exploring this, I was busy writing my presentation on “Defining and Creating a Data Scientist with R” for the forthcoming EARL (Effective Applications of the R Language) conference in London in September (www.earl-conference.com).

So, I thought I’d have a go at plotting the geographic locations of last year’s EARL attendees (then at some point over the coming months I could maybe redo this work with this year’s attendees and look for any overall geographical changes).

The first thing I did was grabbed the data – in this case, the list of EARL 2014 attendees.  Sadly, all I had was the city/town each attendee was from, like this:

 

> getLocations <- read.csv("locations.csv")
> getLocations$Where[1:10]
[1] London       Copenhagen   London       Ludwigshafen 
[5] London       Munich       London       Boston       
[9] London       Warwick
53 Levels: Aabenraa Amsterdam Barcelona Basel ... Zurich

 

For a second, I thought I would have to go and hand-code all the geographical locations, but then I found a brilliant function called “geocode” in the “ggmaps” package.

It uses the google maps API to grab geographical locations (long, lat) on any place in the world, like this:

 

> geocode(c("London", "Chippenham", "Chicago", "Shanghai", 
+   "Tower Hotel London"))
          lon      lat
1  -0.1277583 51.50735
2  -2.1195157 51.46151
3 -87.6297982 41.87811
4 121.4737010 31.23042
5  -0.0738890 51.50667

 

Using geocode means I could quickly get the positions (longitude, latitude) for each city, and aggregate to find the total number of attendees from those locations:

 
> # Aggregate the data
> aggLocations <- aggregate(list(Num = getLocations$Where), 
+   list(Location = getLocations$Where), length) 

> # Merge on geographic locations
> geoLocs <- geocode(as.character(aggLocations$Location))   
> geoDf <- cbind(aggLocations, geoLocs)
> head(geoDf)    
   Location Num       lon      lat 
1  Aabenraa   1  9.415159 55.04085 
2 Amsterdam   1  4.895168 52.37022 
3 Barcelona   2  2.173404 41.38506 
4     Basel   5  7.597551 47.56744 
5   Belfast   1 -5.930120 54.59729 
6    Berlin   5 13.404954 52.52001

Now I can create my first plot:

> theWorld <- borders("world", colour="grey80", fill="grey80")  
> ggplot(data = geoDf, aes(x = lon, y = lat)) +
+      theWorld + theme_minimal() +
+      geom_point(aes(size = Num), color="red", alpha = .7) + 
+      theme(axis.ticks = element_blank(), axis.text = element_blank(), axis.title = element_blank()) +
+      coord_cartesian(ylim=c(30, 72), xlim = c(-130, 30)) +
+      scale_size_continuous(range = c(3, 6)) + 
+      guides(size=FALSE)

 

AllAttendeesClick image for full size version.

 

Not bad as a first attempt, and really easy with ggplot – note the use of coord_cartesian to nicely zoom in on a particular window of the plot.

Straight away, we can see that most attendees were from Europe with a 12 from North America and 1 from Israel (hi Tal!).

Let’s exclude these points and re-plot for Europe only, this using text instead of markers:

> ggplot(data = subset(geoDf, lon > -50), aes(x = lon, y = lat)) +
+   theWorld + theme_minimal() +
+   geom_text(aes(label = Location, size = Num)) + 
+   theme(axis.ticks = element_blank(), axis.text = element_blank(), axis.title = element_blank()) +
+   coord_cartesian(ylim=c(37, 72), xlim = c(-12, 25)) +
+   scale_size_continuous(range = c(3, 6))

 

justEuropeClick image for full size version.

 

Using geom_text to add these labels is such an easy change, and it quickly gives a nice overview of the places from which attendees came.

London is, of course, the most popular place from which attendees came to the conference.  This is unsurprising given that:

  1. The conference is in London
  2. It was organised by the people involved in LondonR

In fact, almost half the attendees were from the London area.  However, it is nice to see the range of places in mainland Europe from which people attended.

My last plot was inspired by the “Journeys to Work” plot from James’ book (http://spatial.ly/2015/03/mapping-flows/).  I wanted to look at the journeys taken by EARL 2014 attendees.

Using geom_segment (as suggested by James’ code) I created the following:

> tH <- geocode("Tower Hotel London")
> ggplot(data = geoDf, aes(x = lon, y = lat)) + 
+   theWorld + theme_minimal() +
+   geom_segment(aes(xend = tH$lon, yend = tH$lat, alpha = Num)) + 
+   theme(axis.ticks = element_blank(), axis.text = element_blank(), axis.title = element_blank()) +
+   coord_cartesian(ylim=c(30, 72), xlim = c(-130, 40)) +
+   scale_alpha_continuous(range = c(.2, .9))

 

distancesClick image for full size version.

 

Finally, I’ll calculate the total amount of miles travelled by EARL 2014 attendees using the sp library:

> library(sp)
> kms <- spDistsN1(as.matrix(geoDf[3:4]), 
+   unlist(towerHotel), longlat=TRUE)
> sum(geoDf$Num * kms * 0.62)
[1] 94587.85

Almost 100,000 miles travelled in total!

Well, that was my quick foray into the world of ggmap. I was particularly impressed with the geocode function in ggmaps that made the grabbing of geographic locations so easy, and how simple it was to create smart map plots using ggplot2.

This year the EARL conference will be in London in September and Boston in November – I hope to see as many of you there as possible, and not just because it’ll mean a bigger sample size for me next time I analyse this data!

To leave a comment for the author, please follow the link and comment on their blog: Mango Solutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)