Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I have been forever annoyed at how long it takes to plot data on a large shapefile. And this is a domain where doesn’t matter if you’re working with MapInfo or R. Just zooming the figure takes ages. But a couple of days ago I bumped into an excellent blogpost by John Myles White that solved the problem in the most Daoistic way, reminiscent of Steve Jobs: Do away with the shapefile!

To have a quick overview of one’s data, there is no need for a shapefile. Just plot latitude and longitude directly into a scattergraph.

That’s it.

Here’s how it is done. For our example, let’s plot a map showing the ratio of Foreign-born vs. Australian residents, by postcode, in all of Australia. As a first step, I made a search and found a ready-to-use file with all Australian postcodes and their geographical coordinates. Next, I got some Census data from the Australian Bureau of Statistics, specifically counts of Australian-born and foreign-born residents sorted by postcode. Quick munge and plot (code below), and voilà!

Admittedly this is a quick fix: One has to guess the shape of Australia and the location of its major cities from their relative place on the map. Even with this limitation, though, I find this solution really appealing. Instead of waiting for many minutes each time I redraw or resize a graph, I can get a complete overview of my dataset instantaneously(!).

require(ggplot2)

setwd('/Users/myuser/R/map')
poa$poa <- as.integer(sub('[a-zA-Z]{3}','', x=poa$region_id))

# Ratio Aus born vs elsewhere born
ratio <- data.frame('poa'=poa$poa,'Freq'=poa$Birthplace_Australia_P/poa$Birthplace_Elsewhere_P) zipcodes <- read.csv('data/pc_full_lat_long.csv') zipcodes <- zipcodes[!duplicated(zipcodes$Pcode),]
zipcodes <- zipcodes[!zipcodes$Lat == 0,] zipcodes <- zipcodes[!zipcodes$Long == 0,]
poacoord <- data.frame('poa'=zipcodes$Pcode, 'lat'=zipcodes$Lat, 'long'=zipcodes$Long) popcount <- na.omit(merge(ratio, poacoord, all=FALSE)) popcount <- popcount[rev(order(popcount$Freq)),] # Place the small bubbles on top
popcount$cat <- sapply(popcount$Freq, function(x) if (x < 6) {'> 1:6'} else {'< 1:6'})

ggplot(popcount, aes(x=long, y=lat, colour=cat)) +
scale_size(range = c(1,20), name = 'Population') +
geom_point() +
coord_equal()


And once we get the data presentation right, we can add the shapefile and go get some tea.

# must have gpclib installed
require("rgdal") # requires sp, will use proj.4 if installed
require("maptools")
require("ggplot2")
require("plyr")
gpclibPermit() # required for fortify method
setwd('/Users/myuser/R/map/data/shp')
[email protected]\$id = rownames([email protected])
amap.points = fortify(amap, region='id')
amap.df = join(amap.points, [email protected], by='id')

ggplot() +
geom_polygon(data=amap.df, aes(long,lat,group=group)) +
geom_path(data=amap.df, aes(long,lat,group=group), color="grey") +
geom_point(data=popcount, aes(x=long, y=lat, colour=cat)) +
scale_size(range = c(1,20), name = "Population") +
scale_colour_discrete(name="ratio of Foreign vs\nAustralian-born") +
coord_equal()