# Speed up your map plotting in R

December 5, 2012
By

(This article was first published on Informed Guess » R, and kindly contributed to R-bloggers)

I have been forever annoyed at how long it takes to plot data on a large shapefile. And this is a domain where doesn’t matter if you’re working with MapInfo or R. Just zooming the figure takes ages. But a couple of days ago I bumped into an excellent blogpost by John Myles White that solved the problem in the most Daoistic way, reminiscent of Steve Jobs: Do away with the shapefile!

To have a quick overview of one’s data, there is no need for a shapefile. Just plot latitude and longitude directly into a scattergraph.

That’s it.

Here’s how it is done. For our example, let’s plot a map showing the ratio of Foreign-born vs. Australian residents, by postcode, in all of Australia. As a first step, I made a search and found a ready-to-use file with all Australian postcodes and their geographical coordinates. Next, I got some Census data from the Australian Bureau of Statistics, specifically counts of Australian-born and foreign-born residents sorted by postcode. Quick munge and plot (code below), and voilà! Admittedly this is a quick fix: One has to guess the shape of Australia and the location of its major cities from their relative place on the map. Even with this limitation, though, I find this solution really appealing. Instead of waiting for many minutes each time I redraw or resize a graph, I can get a complete overview of my dataset instantaneously(!).

```require(ggplot2)

setwd('/Users/myuser/R/map')
poa\$poa <- as.integer(sub('[a-zA-Z]{3}','', x=poa\$region_id))

# Ratio Aus born vs elsewhere born
ratio <- data.frame('poa'=poa\$poa,'Freq'=poa\$Birthplace_Australia_P/poa\$Birthplace_Elsewhere_P)

zipcodes <- zipcodes[!duplicated(zipcodes\$Pcode),]
zipcodes <- zipcodes[!zipcodes\$Lat == 0,]
zipcodes <- zipcodes[!zipcodes\$Long == 0,]
poacoord <- data.frame('poa'=zipcodes\$Pcode, 'lat'=zipcodes\$Lat, 'long'=zipcodes\$Long)

popcount <- na.omit(merge(ratio, poacoord, all=FALSE))
popcount <- popcount[rev(order(popcount\$Freq)),] # Place the small bubbles on top
popcount\$cat <- sapply(popcount\$Freq, function(x) if (x < 6) {'> 1:6'} else {'< 1:6'})

ggplot(popcount, aes(x=long, y=lat, colour=cat)) +
scale_size(range = c(1,20), name = 'Population') +
geom_point() +
coord_equal()
```

And once we get the data presentation right, we can add the shapefile and go get some tea.

```# must have gpclib installed
require("rgdal") # requires sp, will use proj.4 if installed
require("maptools")
require("ggplot2")
require("plyr")
gpclibPermit() # required for fortify method
setwd('/Users/myuser/R/map/data/shp')
[email protected]\$id = rownames([email protected])
amap.points = fortify(amap, region='id')
amap.df = join(amap.points, [email protected], by='id')

ggplot() +
geom_polygon(data=amap.df, aes(long,lat,group=group)) +
geom_path(data=amap.df, aes(long,lat,group=group), color="grey") +
geom_point(data=popcount, aes(x=long, y=lat, colour=cat)) +
scale_size(range = c(1,20), name = "Population") +
scale_colour_discrete(name="ratio of Foreign vs\nAustralian-born") +
coord_equal()
```   R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...