[This article was first published on Steven Mosher's Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In part 1 we discussed UHI in a general way and introduced NASA’s use of nightlights to identify Rural, periurban and Urban stations. Very simply, the latitude and longitude data in the station inventory is used to look up at radiance value in the nightlights file. If that value is  10 or less, the site is Rural. If it is 11-35, the site is Periurban, If the site has lights greater than 35 it is Urban. These values are then used in NASA’s adjustment process. In that process, periurban sites and Urban sites are adjusted to comport with their Rural neighbors.

Logically then we will want to know several things. First how accurate is the station latitude and longitude data, second how accurate is the nighlights data, third how accurate is the look up, and finally do nighlights really “pick out” stations that dont suffer from UHI. Or rather, can you have UHI in a lightless place?

The accuracy of the latitude and longitude data is hard to determine. Basically, we want to ask is the station in the real world at the same location that the database says it is. Well, if we knew exactly where the real stations were independently of the inventory data, then we could just compare the two sources. In some cases this is possible. Let’s start with stations in the US.

In the US station location is now recorded down to 4 decimal places, eg  43.6754. using a rule of thumb that says a degree is roughly 111 km ( at the equator) we can see that 1/10 is roughly 10km, 1/100 is roughly 1km,  and so the US stations are represented down to roughly 100 meter precision. But is it accurate?   That question can be addressed by looking at alternative  sources. In this case, surfacestations.org. In that project many sites were visited and volunteers took GPS readings at the site. A complete comparision of the field measurements and the data held by NCDC has not  been made public. So, one can only do manual spot checks.

In any case the first thing one can realize about the Giss inventory is that they appear not to use the more precise GHCN data. In their inventory for the US they continue to use the data down to 1/100 of a degree precision.  Or roughly 1km. Given the latitude of the US, however, one can say the precision is slightly better than this. Still, with more precise data available one wonders why not use it. Particularly because if a station location happens to fall on a grid line, the rounding differences between machines can cause one system to look up one cell, and another system to look up the adjoining cell. This is not an issue, if the adjoining cells are all roughly the same, however, why introduce this kind of uncertainty when you have better data.

Instead of focusing on the US stations, I will examine the ROW. Primarly because Hansen2010 has extended the use of nightlights beyond the US and because the accuracy problems are more easily illustrated there. In order to illustarte the accuracy problem in the ROW we will utilize Google map. In short, we will look up the station location used by GISS and then see if there is station there. That phenomena is clearest in places like coastlines, and deserts. It clear because at coastal locations you will see the google push pin in the water. As noted peter has done much of the work documenting this and notifying NASA, who apparently think it doesnt matter. It’s also clear in deserts because you can see a push pin located out in the middle of a pathless field usually with a small town nearby. At airports its clear when we have other supporting evidence, such as ground based photos of the station which we can then place using google maps.

A few minutes with the Inventory will demonstrate that something is wrong with NOAA’s inventory, or wrong with Google Earth. Since coastal locations show the problem most clearly, I took the Inventory data and read it into a fusion table and then filtered the data to look for coastal locations within 1 miles of the coast. Pulling that into Google earth one can tour the stations. Using the ruler tool one can see that location errors in the 1/100ths of a degree take stations and put them into the sea. Sometime the error is much larger: Consider this station, off by 2 degrees

For download you can find a google earth tour some of  the suspect coastal station locations in my drop down box.

or visit google fusion table I created for this.

The bottomline is this. The station latitudes and longitudes are precise to 1/100th but inaccurate up to 10′s of kilometers. I find no evidence to support Hansen’s contention that the station data is accurate to .01 degrees. That means when we look at the nightlights at say 34.55,45.23 we get the nightlights at that location, BUT the station may actually be at 34.876753423, 45.198763254. And that location may have different nightlights.

The problem actually gets worse than this because even with the exact same coordinates and the exact same nightlights data, I can get a different value when I do my lookup than when NASA does theirs. One reason for this is rounding. If I look up a site location that falls exactly on a boundary between two cells, I have to choose which cell to pull data from. Even with the same package, raster, on different machines i could get different results. To see how large this problem is, I repeated an experiment that Ron Broberg did a while ago. Using the GISS inventory and the same nightlights that NASA uses checked the values they got for nightlights with the values produced by using raster.