# Incremental improvements to Nightlights mapping thanks to R-Bloggers

October 22, 2010
By

[This article was first published on Things I tend to forget ยป R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

My recent post Nightlights: cool data, bad geocoding highlighted some of the geocoding challenges Steve Mosher has been finding as he works with this interesting โlight pollutionโ data set.

It was also my first article reposted on Tal Galiliโs fantastic R-Bloggers site which I have been following for a while. But even better than the surge of new visitors were the great comments and suggestions posted by members of the community. In this post, Iโm going to walk through each suggestion to illustrate just how generous and helpful this community can be.

Our starting point is where we ended up in my first post, using ggplot2 to display the raster nightlights data and map overlay:

```library(RCurl)
library(R.utils)
library(rgdal)
library(raster)

hiResTif = "world_avg.tif"

untar(calibratedLights)
gunzip(paste(hiResTif, '.gz', sep='')

hiResLights = raster( hiResTif )

# Eastern Mass., Cape Cod & Islands:
e = extent(-71.5, -69.5, 41, 43)

r = crop(hiResLights,e)
p = rasterToPoints(r)
df = data.frame(p)
colnames(df) = c("lon", "lat", "dn")

g = ggplot( data=df) + geom_point(aes(x=lon, y=lat, color=dn) )
g = g + scale_colour_gradient(low="black", high="white", trans="sqrt")
g = g + theme_bw() + xlim(c(-71.5,-69.5)) + ylim(c(41,43))
g = g + opts( axis.text.x = theme_blank(), axis.text.y = theme_blank() )
g = g + borders("state", colour="yellow", alpha=0.5)
```

Note the mismatch between the data and map overlay and the weirdness in the map where points are missing on the North Shore:

original data, positioning, and borders in ggplot2

Ben Bolker suggested a way to eliminate the artifacts which led me to this discussion on R-sig-Geo between Hadley Wickham and Paul Hiemstra which tipped me off to the existence of `geom_path` layer in addition to the `geom_polygon` layer which `borders()` usually produces. Polygons are closed but paths need not be, so that helps. And ggplot2โฒs `map_data()` function seems to grab the same data as `borders()`:

```b = map_data("state")

g = ggplot( data=df) + geom_point(aes(x=lon, y=lat, color=dn) )
g = g + scale_colour_gradient(low="black", high="white", trans="sqrt")
g = g + theme_bw() + xlim(c(-71.5,-69.5)) + ylim(c(41,43))
g = g + opts( axis.text.x = theme_blank(), axis.text.y = theme_blank() )
g = g + geom_path(data=b, aes(x=long,y=lat,group=group), colour="yellow", alpha=0.5)
```

Bonus: `geom_path()` obeys the โ`alpha=0.5`โ directive to set the transparency:

Worst artifacts solved by switching to map_data() and geom_path()

But Robert Hijmans really hit it out the park with two great suggestions. First, he pointed me towards a much, much better source of coastline data by using rasterโs `getData()` function to grab data from the GADM database of Global Administrative Areas:

```usa = getData('GADM', country="USA", level=0)
```

Level 0 will get you country boundaries, Level 1 for state/province, and so on. So weโll lose state boundaries, but these files are pretty big to start with and can take a lot longer to plot.

Also, be warned: apparently somebody sinned against The Church of GNU, so you may need to run `gpclibPermit()` manually before running `fortify()` on the `SpatialPolygonsDataFrame`:

```> f_usa = fortify(usa)

Note: polygon geometry computations in maptools
depend on the package gpclib, which has a
restricted licence. It is disabled by default;
to enable gpclib, type gpclibPermit()

Checking rgeos availability as gpclib substitute:
FALSE
Error: isTRUE(gpclibPermitStatus()) is not TRUE
> gpclibPermit()
[1] TRUE
```

With that hoop cleared, we can `fortify()` and plot this new layer:

```f_usa = fortify(usa)

g = ggplot( data=df) + geom_point(aes(x=lon, y=lat, color=dn) )
g = g + scale_colour_gradient(low="black", high="white", trans="sqrt")
g = g + theme_bw() + xlim(c(-71.5,-69.5)) + ylim(c(41,43))
g = g + opts( axis.text.x = theme_blank(), axis.text.y = theme_blank() )
g = g + geom_path(data=f_usa, aes(x=long,y=lat,group=group), colour="yellow", alpha=0.5)
```

The coordinates are still shifted, butโwowโwhat a beautiful coast line. Cape Ann on the North Shore is really there now:

Robert also points out an important mismatch in that GDAL returns the top left corner and resolution, so we could be off by a pixel or so. A quick call to `xmax()` and `ymax()` will fix this in our original raster:

```xmax(hiResLights) = 180
ymin(hiResLights) = -90

r = crop(hiResLights,e)
p = rasterToPoints(r)
df = data.frame(p)
colnames(df) = c("lon", "lat", "dn")

g = ggplot( data=df) + geom_point(aes(x=lon, y=lat, color=dn) )
g = g + scale_colour_gradient(low="black", high="white", trans="sqrt")
g = g + theme_bw() + xlim(c(-71.5,-69.5)) + ylim(c(41,43))
g = g + opts( axis.text.x = theme_blank(), axis.text.y = theme_blank() )
g = g + geom_path(data=f_usa, aes(x=long,y=lat,group=group), colour="yellow", alpha=0.5)
```

Hey, thatโs not bad at all:

final try, after adjusting GDAL pixel shift

Looking at the final version leads me to wonder how much of the geocoding problem is position, and how much is resolution/blurring/smearing. The lights of Provincetown, for instance, look pretty good. Maybe the blob is too north by a few pixels, but at least itโs well contained by land now. On Nantucket, the blur is half in the harbor. Then again, on Nantucket, most of the lights are right on the harbor, from the ferry terminal and running east to main street. So the lights are just about where they should be. Perhaps theyโre just blurred and therefore spill into the harbor?

But the real point of the post is to highlight the generosity of this community. For that, thanks. And again: welcome R-Bloggers readers!

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.