by Aimée Gott, Senior Consultant
If you follow us on Twitter you might have noticed that Mango has been doing a bit of travelling this summer. He’s been to San Francisco for the EARL conference, Brussels for UseR, as well as a few other places to teach some training courses. He even managed to sneak in a little holiday in LA where he was disappointed to find there are no cats honoured on the Hollywood walk of fame.
It was when we landed in LA that we got talking about the largest airports in the world (me and our marketing manager Karis – don’t worry, I am not talking to stuffed cats just yet!). After a little guessing Google came to the rescue, and I was quite surprised by the airports that topped the list. That conversation stuck with me through the many airports I have since passed through and it came back to me again after seeing James Cheshire give another inspiring talk at LondonR recently.
James gave a great talk full of inspiring visualisations —many map based— that made me think it was about time to dust off my notes on mapping tools in R and have a play. Along the way I got slightly distracted by the data itself but it was a great chance to refresh myself on the pros and cons of a couple of my favourite mapping packages – ggmap and leaflet.
I didn’t want to spend long collecting and cleaning so I used data available on Wikipedia showing the top 50 airports in terms of passenger numbers from 2000 to 2016. Frustratingly, for 2000 to 2004 the data for only 30 airports were available. For this reason, I only used the top 30 for all years.
I started out looking at
ggmap. In general, my mapping requirements are not worldwide data but local data —country or region level— so this is a great tool. Because it’s built on top of
ggplot2 it meant I didn’t have to learn another tool, but simply needed to extend the capability I already have. Unfortunately, it’s not the best tool for worldwide data as you can’t zoom to show the whole world map. Because of this, I decided to focus on Europe.
With very little effort I was able to use
ggmap to obtain the latitude and longitude of the airports in the data, generate a Google map of Europe and overlay the airports, sized based on the number of passengers that passed through them in the given year. The geocode function was in fact what made all of this task much simpler, as it can find the locations of the airports automatically rather than having to search for additional data.
I didn’t like the Google map though and decided to play with some of the other options. There are quite a few for the base map in
ggmap – some from Google, such as satellite, and some completely different ones like ‘watercolor’, which gives a completely different look that I personally like because it takes away a lot of the additional information that we don’t really need in this particular plot i.e. borders – although I would have liked to see some of the major cities.
Moving on to
leaflet I was reminded that I need to use it more often. It’s (almost too) rich in features and I had to stop myself from making my dots appear as planes.
In terms of the code, it uses
magrittr to add layers to visualisations and it wasn’t a lot of work to get the data appearing on a world map with pop-ups to identify the airport, year and passenger numbers.
The biggest challenge was setting the relative sizes of the points. Unlike
ggmap —which uses
ggplot2 and handles the sizing for us— we have to be more specific with
leaflet. It took a few runs of the plot to get to a size I was happy with. But once I was at this point I could easily repeat the layer to view other years on the same graphic. This is more clunky as you have to manually filter the data and add the layers but the trade-off is that you can add switches to turn layers on and off.
This was where I got distracted by the data. By being able to see the global data for multiple years in one graphic it was clear that over the last 16 years there has been a shift in the location of the airports that carry the most passengers from North America to Asia.
So, one last graphic took me back to
ggplot2. The data has been scaled to account for the fact that air passenger numbers have continued to increase over the last 20 years. Interestingly, there has been a very clear increase in the number of passengers travelling through Asian airports; in fact, in 2016 half of the top 30 airports were located in Asia.
Is there a better package?
To return to the two packages, they both have their strengths and weaknesses and it would really depend on my needs as to which I would chose for a particular visualisation. For interactivity, I would without a doubt go straight to
leaflet, for something more suited to local data or intended for static reporting
ggmap would be my preference.