John Snow’s Cholera data in more formats

March 13, 2013
By

(This article was first published on Robin's BlogRobin's Blog » R, and kindly contributed to R-bloggers)

In honour of the bicentenary of John Snow’s birth – and because I was asked to by someone via email – I have now released my digitisation of John Snow’s Cholera data in a few other formats: KML and as Google Fusion Tables.

To save you reading my previous blog posts on the subject, I’ll give a brief overview of my data. John Snow produced a famous map in 1854 showing the deaths caused by a cholera outbreak in Soho, London, and the locations of water pumps in the area. By doing this he found there was a significant clustering of the deaths around a certain pump – and removing the handle of the pump stopped the outbreak. This is a bit of a simplification (see Wikipedia or the John Snow Society for more details), but generally covers what happened.

Anyway, I digitised John Snow’s original data and georeferenced it to the Ordnance Survey co-ordinate system, so that I could overlay it on modern maps of that area, as below (using the OS OpenData StreetView data, containing Ordnance Survey data © Crown copyright and database right 2013):

Locations of deaths from Snow's analysis shown on a modern OS Map

while still being able to overlay it on John Snow’s original map:

Snow's original map with the vector data for pumps and deaths overlainAnyway, the data that is available is:

  • Cholera Death locations (Vector) with attribute data giving the number of deaths at each point
  • Pump locations (Vector)
  • John Snow’s original map georeferenced to the Ordnance Survey National Grid (Raster)
  • Current Ordnance Survey maps of the area (from those released under OS OpenData; Contains Ordnance Survey data © Crown copyright and database right 2013Raster)

These are available for download/use in a number of formats:

  • A zip file with the Vector data as Shapefiles and the Raster data as TIFF images
    (this is the original data provided for download by me – and is probably what you want for importing into a GIS system)
    Download
  • A zip file with the Vector data as KML files and the Raster data as TIFF images
    (suitable for importing into Google Earth and other products that use KML files)
    Download
  • Links to Google Fusion Tables with the vector data already imported
    Cholera Deaths
    Pumps
    Deaths and Pumps together (this dataset has both pump and death points in the same table: pump points have Count values of -999, death points have Count values > 0 which give the number of Cholera deaths at that location)

The latter is particularly cool, I think as it allows you to very easily overlay the data on modern Google Maps data, and should allow some interesting ‘mashups’ to be created. All of the tables are set to be shared publically, so you could be able to copy them (using the Copy Table command in the File menu) and play around with them as much as you want! If you click the Example Map tab then you’ll see a very rudimentary map I’ve created using the data on top of Google Maps (see below) – I’m sure you’ll be able to do far better visualisations than that.

Example map from Google Fusion Tables

The folks at CartoDB have also used this data in one of their tutorials which shows you how to import the data to CartoDB and create a styled map to show the deaths with different sized markers – yet another way you can use the ‘first real GIS data’ in today’s modern web-based GIS tools.

So, enjoy – and please let me know (via the comments below) what you create!

Update: There was a problem with the KML files and Google Fusion Tables that I uploaded yesterday, caused by an incorrect co-ordinate transformation between the Ordnance Survey grid references and latitude/longitude. This has now been fixed and the downloads and tables have been updated. Sorry about this.

To leave a comment for the author, please follow the link and comment on his blog: Robin's BlogRobin's Blog » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.