Speed up your map plotting in R

December 5, 2012

(This article was first published on Informed Guess » R, and kindly contributed to R-bloggers)

I have been forever annoyed at how long it takes to plot data on a large shapefile. And this is a domain where doesn’t matter if you’re working with MapInfo or R. Just zooming the figure takes ages. But a couple of days ago I bumped into an excellent blogpost by John Myles White that solved the problem in the most Daoistic way, reminiscent of Steve Jobs: Do away with the shapefile!

To have a quick overview of one’s data, there is no need for a shapefile. Just plot latitude and longitude directly into a scattergraph.

That’s it.

Here’s how it is done. For our example, let’s plot a map showing the ratio of Foreign-born vs. Australian residents, by postcode, in all of Australia. As a first step, I made a search and found a ready-to-use file with all Australian postcodes and their geographical coordinates. Next, I got some Census data from the Australian Bureau of Statistics, specifically counts of Australian-born and foreign-born residents sorted by postcode. Quick munge and plot (code below), and voilà!

scatter Admittedly this is a quick fix: One has to guess the shape of Australia and the location of its major cities from their relative place on the map. Even with this limitation, though, I find this solution really appealing. Instead of waiting for many minutes each time I redraw or resize a graph, I can get a complete overview of my dataset instantaneously(!).


poa <- read.csv('data/2011_BCP_ALL_for_AUST_short-header/2011 Census BCP All Geographies for AUST/POA/AUST/2011Census_B01_AUST_POA_short.csv', as.is=TRUE)
poa$poa <- as.integer(sub('[a-zA-Z]{3}','', x=poa$region_id))

# Ratio Aus born vs elsewhere born
ratio <- data.frame('poa'=poa$poa,'Freq'=poa$Birthplace_Australia_P/poa$Birthplace_Elsewhere_P)

zipcodes <- read.csv('data/pc_full_lat_long.csv')
zipcodes <- zipcodes[!duplicated(zipcodes$Pcode),]
zipcodes <- zipcodes[!zipcodes$Lat == 0,]
zipcodes <- zipcodes[!zipcodes$Long == 0,]
poacoord <- data.frame('poa'=zipcodes$Pcode, 'lat'=zipcodes$Lat, 'long'=zipcodes$Long)

popcount <- na.omit(merge(ratio, poacoord, all=FALSE))
popcount <- popcount[rev(order(popcount$Freq)),] # Place the small bubbles on top
popcount$cat <- sapply(popcount$Freq, function(x) if (x < 6) {'> 1:6'} else {'< 1:6'})

ggplot(popcount, aes(x=long, y=lat, colour=cat)) +
scale_size(range = c(1,20), name = 'Population') +
geom_point() +

And once we get the data presentation right, we can add the shapefile and go get some tea.

# must have gpclib installed
require("rgdal") # requires sp, will use proj.4 if installed
gpclibPermit() # required for fortify method
amap <- readOGR(dsn=".", layer="aust_cd66states")
[email protected]$id = rownames([email protected])
amap.points = fortify(amap, region='id')
amap.df = join(amap.points, [email protected], by='id')

ggplot() +
geom_polygon(data=amap.df, aes(long,lat,group=group)) +
geom_path(data=amap.df, aes(long,lat,group=group), color="grey") +
geom_point(data=popcount, aes(x=long, y=lat, colour=cat)) +
scale_size(range = c(1,20), name = "Population") +
scale_colour_discrete(name="ratio of Foreign vs\nAustralian-born") +


To leave a comment for the author, please follow the link and comment on their blog: Informed Guess » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)