John Snow, and OpenStreetMap

February 27, 2015

(This article was first published on Freakonometrics » R-english, and kindly contributed to R-bloggers)

While I was working for a training on data visualization, I wanted to get a nice visual for John Snow’s cholera dataset. This dataset can actually be found in a great package of famous historical datasets.


One can easily visualize the deaths, on a simplified map, with the streets (here simple grey segments, see Vincent Arel-Bundock’s post)

plot(Snow.deaths[,c("x","y")], col="red", pch=19, cex=.7,xlab="", ylab="", xlim=c(3,20), ylim=c(3,20))
slist <- split(Snow.streets[,c("x","y")],as.factor(Snow.streets[,"street"]))
invisible(lapply(slist, lines, col="grey"))

Of course, one might add isodensity curves (estimated using kernels)

kde2d <- bkde2D(Snow.deaths[,2:3], bandwidth=c(0.5,0.5))
contour(x=kde2d$x1, y=kde2d$x2,z=kde2d$fhat, add=TRUE)

Now, what if we want to visualize that dataset on a nice background, from Google Maps, or OpenStreetMaps? The problem here is that locations are in a weird coordinate representation system. So let us use a different dataset. For instance, on Robin Wilson’s blog, one can get datasets in a more traditional representation (here the epsg 27700). We can extract the dataset from


Then, we need our background,

map = openmap(c(lat= 51.516,   lon= -.141),
              c(lat= 51.511,   lon= -.133))
map=openproj(map, projection = "+init=epsg:27700") 
points(deaths@coords,col="red", pch=19, cex=.7 )

If we zoom in (the code above will be just fine), we get

And then, we can compute the density

kde2d <- bkde2D(X, bandwidth=c(bw.ucv(X[,1]),bw.ucv(X[,2])))

based on the same function as before (here I use marginal cross-validation techniques to get optimal bandwidths). To get a nice gradient, we can use

clrs=colorRampPalette(c(rgb(0,0,1,0), rgb(0,0,1,1)), alpha = TRUE)(20)

and finally, we add it on the map

image(x=kde2d$x1, y=kde2d$x2,z=kde2d$fhat, add=TRUE,col=clrs)
contour(x=kde2d$x1, y=kde2d$x2,z=kde2d$fhat, add=TRUE)

To leave a comment for the author, please follow the link and comment on their blog: Freakonometrics » R-english. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)