Maps in R: Plotting data points on a map
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In the introductory post of this series I showed how to plot empty maps in R.
Today I’ll begin to show how to add data to R maps. The topic of this post is the visualization of data points on a map.
We will use a couple of datasets from the OpenFlight website for our examples.
After loading the airports.dat
file let’s visualize the first few lines.
> airports <- read.csv("http://openflights.svn.sourceforge.net/viewvc/openflights/openflights/data/airports.dat", header = FALSE) > colnames(airports) <- c("ID", "name", "city", "country", "IATA_FAA", "ICAO", "lat", "lon", "altitude", "timezone", "DST") > head(airports) ID name city country IATA_FAA ICAO lat lon altitude timezone DST 1 1 Goroka Goroka Papua New Guinea GKA AYGA -6.081689 145.3919 5282 10 U 2 2 Madang Madang Papua New Guinea MAG AYMD -5.207083 145.7887 20 10 U 3 3 Mount Hagen Mount Hagen Papua New Guinea HGU AYMH -5.826789 144.2959 5388 10 U 4 4 Nadzab Nadzab Papua New Guinea LAE AYNZ -6.569828 146.7262 239 10 U 5 5 Port Moresby Jacksons Intl Port Moresby Papua New Guinea POM AYPY -9.443383 147.2200 146 10 U 6 6 Wewak Intl Wewak Papua New Guinea WWK AYWK -3.583828 143.6692 19 10 U |
Latitude and longitude are reported for every airport in the dataset.
Let’s draw the map of Europe with the help of rworldmap
package, as was shown in the previous post on maps:
> library(rworldmap) > newmap <- getMap(resolution = "low") > plot(newmap, xlim = c(-20, 59), ylim = c(35, 71), asp = 1) |
Then we can easily lay the airports over the map:
> points(airports$lon, airports$lat, col = "red", cex = .6) |
Adding dimensions
In the introductory post I mentioned that ggmap
actually builds on the ggplot
graphics engine, thus all the strengths of ggplot
are available when mapping data with ggmap
.
Here I will show a couple of examples on how to take advantage of this.
Let’s load another dataset from OpenFlights in R.
> routes <- read.csv("http://openflights.svn.sourceforge.net/viewvc/openflights/openflights/data/routes.dat", header=F) > colnames(routes) <- c("airline", "airlineID", "sourceAirport", "sourceAirportID", "destinationAirport", "destinationAirportID", "codeshare", "stops", "equipment") > head(routes) airline airlineID sourceAirport sourceAirportID destinationAirport destinationAirportID codeshare stops equipment 1 2B 410 AER 2965 DME 4029 0 CR2 2 2B 410 ASF 2966 LED 2948 0 CR2 3 2B 410 CEK 2968 DME 4029 0 CR2 4 2B 410 CEK 2968 KZN 2990 0 CR2 5 2B 410 CEK 2968 OVB 4078 0 CR2 6 2B 410 DME 4029 AER 2965 0 CR2 |
Starting from the routes
dataset, let’s count the both number of routes departing from and arriving to a particular airport. I’m using another very useful package by Hadley Wickham for this task.
> library(plyr) > departures <- ddply(routes, .(sourceAirportID), "nrow") > names(departures)[2] <- "flights" > arrivals <- ddply(routes, .(destinationAirportID), "nrow") > names(arrivals)[2] <- "flights" |
Then, let's add the info on departing and arriving flights to the airports
dataset (which contains the coordinates data.)
> airportD <- merge(airports, departures, by.x = "ID", by.y = "sourceAirportID") > airportA <- merge(airports, arrivals, by.x = "ID", by.y = "destinationAirportID") |
The goal is now to plot the airports on the map of Europe as circles whose area is proportional to the number of departing flights.
The first step is to get the map from GoogleMaps (or one of the other available services), like was shown last time.
> library(ggmap) > map <- get_map(location = 'Europe', zoom = 4) |
The following lines already get us quite close to producing the desired chart.
> mapPoints <- ggmap(map) + + geom_point(aes(x = lon, y = lat, size = sqrt(flights)), data = airportD, alpha = .5) |
The ggmap
command prepares the drawing of the map. The geom_point
function adds the layer of data points, as would be normally done in a ggplot
. A thorough explanation of ggplot
is well beyond the scope of this post, but here are quick details on what is passed to geom_point
:
- aes
indicates how aesthetics (points in this case) are to be generated; the lon
variable is associated to the x axis, lat
to y, and the size of the points is proportional to the value of the variable flights
(actually to its square root;)
- data
indicates the dataset where the variable passed to aes
are to be found;
- the alpha
parameter controls the transparency of the plotted points (some degree of transparency will make the overlapping circles distinguishable.)
And here's what appears on the R plotting window when one types mapPoints
in the console.
A few tweaks to the legend (so that it does report the actual number of departures rather than the square root,) and the chart is ready for publication.
> mapPointsLegend <- mapPoints + + scale_area(breaks = sqrt(c(1, 5, 10, 50, 100, 500)), labels = c(1, 5, 10, 50, 100, 500), name = "departing routes") > mapPointsLegend |
Once more, the map is a ggplot
(type class(mapPoints)
in your console to check) thus a nearly unlimited set of operations can be performed to improve it. For example, the number of departing flights could be portrayed by the color of the circles rather than their dimension.
As a final example for this post, I'll show the code to perform faceting. In other words we will have a couple of panels, one reporting the departing flights, the other the incoming ones.
# create the data set containing both departures and arrivals > airportD$type <- "departures" > airportA$type <- "arrivals" > airportDA <- rbind(airportD, airportA) # map the data # map + data points > mapPointsDA <- ggmap(map) + + geom_point(aes(x = lon, y = lat, size = sqrt(flights)), data = airportDA, alpha = .5) # adjust the legend > mapPointsLegendDA <- mapPointsDA + + scale_area(breaks = sqrt(c(1, 5, 10, 50, 100, 500)), labels = c(1, 5, 10, 50, 100, 500), name = "routes") # panels according to type (departure/arrival) > mapPointsFacetsDA <- mapPointsLegendDA + + facet_grid(. ~ type) # plot the map > mapPointsFacetsDA |
What's next
Next time we will deal with geographically aggregated data and how to display them in choropleth maps.
View (and download) the full code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 | airports <- read.csv("http://openflights.svn.sourceforge.net/viewvc/openflights/openflights/data/airports.dat", header = FALSE) colnames(airports) <- c("ID", "name", "city", "country", "IATA_FAA", "ICAO", "lat", "lon", "altitude", "timezone", "DST") head(airports) library(rworldmap) newmap <- getMap(resolution = "low") plot(newmap, xlim = c(-20, 59), ylim = c(35, 71), asp = 1) points(airports$lon, airports$lat, col = "red", cex = .6) routes <- read.csv("http://openflights.svn.sourceforge.net/viewvc/openflights/openflights/data/routes.dat", header=F) colnames(routes) <- c("airline", "airlineID", "sourceAirport", "sourceAirportID", "destinationAirport", "destinationAirportID", "codeshare", "stops", "equipment") head(routes) library(plyr) departures <- ddply(routes, .(sourceAirportID), "nrow") names(departures)[2] <- "flights" arrivals <- ddply(routes, .(destinationAirportID), "nrow") names(arrivals)[2] <- "flights" airportD <- merge(airports, departures, by.x = "ID", by.y = "sourceAirportID") airportA <- merge(airports, arrivals, by.x = "ID", by.y = "destinationAirportID") library(ggmap) map <- get_map(location = 'Europe', zoom = 4) mapPoints <- ggmap(map) + geom_point(aes(x = lon, y = lat, size = sqrt(flights)), data = airportD, alpha = .5) mapPointsLegend <- mapPoints + scale_area(breaks = sqrt(c(1, 5, 10, 50, 100, 500)), labels = c(1, 5, 10, 50, 100, 500), name = "departing routes") mapPointsLegend # create the data set containing both departures and arrivals airportD$type <- "departures" airportA$type <- "arrivals" airportDA <- rbind(airportD, airportA) # map the data # map + data points mapPointsDA <- ggmap(map) + geom_point(aes(x = lon, y = lat, size = sqrt(flights)), data = airportDA, alpha = .5) # adjust the legend mapPointsLegendDA <- mapPointsDA + scale_area(breaks = sqrt(c(1, 5, 10, 50, 100, 500)), labels = c(1, 5, 10, 50, 100, 500), name = "routes") # panels according to type (departure/arrival) mapPointsFacetsDA <- mapPointsLegendDA + facet_grid(. ~ type) # plot the map mapPointsFacetsDA |
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.