Mapping Hotspots with R: The GAM

October 25, 2011

(This article was first published on IDV User Experience, and kindly contributed to R-bloggers)

I’ve been getting a lot of questions about the method used to map the hotspots in the seasonal drunk-driving risk maps.  It uses the GAM (Geographical Analysis Machine), a way of detecting spatial clusters from two data inputs: the data of interest, and a control, or “underlying population at risk” (or at least your best substitute for that).

These four distinct hotspot maps were made in R (using a shorter radial distance than previously posted).  They indicate areas where instances of drunk driving fatalities are much higher than normal in winter, spring, summer, or autumn.

Four individual GAM hotspot maps made in R with a baseline mesh of 10,000 points each with a radius of 14 miles and 49 miles.

The Geographical Analysis Machine was whipped up by Stan Openshaw and his team in the late 1980s as a way of calculating relative geographic clusters or hotspots.  It requires a point dataset of interest, which are known events, and a background point dataset representing candidates for those events (some examples at the bottom of the post).

The mesh backdrop
The study area is canvased with a mesh of backdrop points.  A fine mesh will result in a higher resolution output, with cluster zones of greater precision.  It also takes longer to process.  These are the seeds from which your hotspot kernels may or may not grow (depending on what you consider significant).

Here’s my study area in R with a mesh backdrop of 10,000 points.  The finer the mesh, the greater the resulting resolution will be, also the greater the amount of coverage overlap depending on what you chose as a meaningful radial distance.

Radial distance
From each point of the mesh backdrop a radial distance is swiped out.  The ratio of events to candidates is counted up, and if the ratio is significantly (how significant is up to you) beyond what a Poisson distribution would expect, then that radius area is retained, nuked if not.  These significant radii are merged together for a discrete vector output of hotspots or they can be used to feed a kernel heatmapping which will result in a bitmap illustration for varying magnitude at distance (like in the above maps).

Events are mapped along with ‘candidates’ in this illustration.

Overlay a mesh to serve as the starting points of your radii.
In real life you’d want a finer mesh than this, given the data density.

Swipe out a radial distance from around the mesh points.

Radii containing a significantly high event-to-candidate ratio are retained.
Wash, rinse, and repeat, with varying radius distances and you’ve got a bubbly indicator of clusters.  Additionally, you can use the clusters as inputs to a kernel density map for a smooth heatmap version.

The previous hotspot mapping post went into greater detail on why it’s important to isolate event intensity from it’s underlying phenomena.  But it is such a cool and useful tool that I can’t help providing examples again…

  • Cancer Hotspots.  Cases of prostate cancer vs. men of a certain age in order to see where case rates are actually elevated, not just where lots of older men live.
  • Anything Deserts.  Public playgrounds vs. block-level child counts to identify play “deserts.”  Actually, anything deserts are a pretty hot topic right now in the social sciences, like food deserts.  In this case you’d be looking for exceptionally low ratios, rather than high.
  • Crime Risk.  Crime tends to happen where there are people, so a flat crime map will look a lot like a population map.  If you account for the underlying population at risk you can get a sense of localized areas of very specific risk.
  • Just about anything having to do with mapping epidemics.

We are really interested in what folks are up to in R and are doing our best to provide inroads to that work so it can be accessed by more folks in your organization.  Let us know if you have any ideas!

To leave a comment for the author, please follow the link and comment on their blog: IDV User Experience. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , , , , , , , ,

Comments are closed.


Mango solutions

RStudio homepage

Zero Inflated Models and Generalized Linear Mixed Models with R

Dommino data lab

Quantide: statistical consulting and training



CRC R books series

Six Sigma Online Training

Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)