Boundary conditions Dominate

October 16, 2010
By

(This article was first published on Steven Mosher's Blog, and kindly contributed to R-bloggers)

In part 1 and part 2 we went over the background of nightlights and the fundamental problem: Station inventory data had errors in it: In Hansen 2010, Hansen writes:

Station location in the meteorological data records is provided with a resolution of 0.01 degrees of latitude and longitude, corresponding to a distance of about 1 km.  This resolution is useful for investigating urban effects on regional atmospheric temperature.  Much higher resolution would be needed to check for local problems with the placement of thermometers relative to possible building obstructions, for example.  In many cases such local problems are handled via site inspections and reported in the “metadata” that accompanies station records, as discussed by Karl and Williams [1987], Karl et al. [1989], and Peterson et al. [1998a].”

As we have shown in part 1 and part 2 this is appears to be wrong. Simply, when Hansen takes the station lat/lon at face value, he is introducing location error. That error is further confounded by the process of look up. Simply, when one takes a fractional value such as 73.54, 46.85 and then uses that value to “look up” the radiance value in a matrix of values that is defined by discrete values you get  a few problems. The first problem is related to rounding, the second problem is related to stations that fall on lat/lon boundaries. We can assess the latter problem simply by proccessing the lat/lon values to see how many fall on boundaries. Lets review how this works. The Nightlights file is defined thusly

class       : RasterLayer filename    : /Users/mosher/Moshtemp5.1/world_avg.tif

nrow       : 21600

ncol        : 43200

ncell       : 933120000

xres        : 0.00833

yres        : 0.00833

that is 30 arc second data.  every box is  1/120th of a degree on a side. So, when you take a Lat/lon of say  73.95, 65.36, you will want to calculate which “box” this falls into. That’s merely a matter of finding the multiple of  1/120 that station maps into. That division process will result in some lat/lons falling onto the lines. For example, if you had .25 degree boxes at 90,89.75, ect, then a station that was at  75.75 would fall on a line. A station at 86.63 would not fall on a line.Simple, the problem is what do you do with boundary cases? do they go up a box? or down a box? to the right? or to the left? there is no right answer, But it should be clear that the smaller the resolution the greater this problem is.

How big is this problem with Nighlights. Big.

As you can see the vast vast majority of stations will fall onto corners. which means there are 4 possible boxes they could be put into depending upon your rules. Now, to repeat, one can argue that this should not make a huge difference IF the nightlight values are uniform. And we expect them to be somewhat uniform since the  1km data is derived from 2.7km data. But even here there will be points where the exact box matters. To prove that we compare Giss Lights values with values looked up using “raster” Simply, we take nasa nightlights and subtract raster nightlights. We get the following: The largest difference is 149. That means when Giss looks up the value they pick a cell that has a value of say “32″ and when raster looks up the same cell ( and picks the neighbor) it finds a value of “181″. Also, its obvious that many times they do get the same value.

What’s this all mean in terms of the determination of which sites are rural? That is shown here. Again, using the same station data as NASA, the same Nighlights data, but a look up method that just looks one cell left or right, up or down, you get dramatically different results for categgorizing stations: Nasa rule is that Nightlights less than 11 means rural and greater than 35 means urban. When we apply apply that rule we dont come close to the same outcome. Why, because merely by looking one box to the left/right or up/down we see a different nightlights value. A value that CHANGES our categorization of the station. This method of categorizing is not robust. In the posts that follow I’ll explore other methods ( including Hansen’s Pitch black criteria) and show how different methods lead to different results.

The issues here, however, also run deep into the nightlights file itself. Upon closer examination there appear to be some irregularities with the file itself. Stay tuned, It’ll be clear as mud.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

Tags: