**Steven Mosher's Blog**, and kindly contributed to R-bloggers)

## Introduction

Zhang and Imhoff (2010) pdf here utilized NLCD impervious surface area (ISA), Olson biomes, and MODIS Land Surface temperature (LST) to estimate the magnitude of UHI in large cities across the US. Peng employed a similar approach in studying 419 large cities ( population greater than 1m ) around world. Peng’s work suggests a limit or threshold to the Surface Urban Heat Island (SUHI) effect in that he found that city size and population was not an effective predictor of SUHI for very large cities. Imhoff’s study on somewhat smaller cities suggest there is a range at which city size is an effective predictor of SUHI

In addition, Zhang and Imhoff found that SUHI is related to the surrounding environment. A city that is developed in a forest biome, for example, will show a greater SHUI than one embedded in a desert.

Zhang and Imhoff’s curve is, as they note, somewhat similar to Oke’s postulated curve for the relationship between population and the Urban Heat Island ( UHI as measured by air temperature as opposed to surface temperature ) and it shares a certain weakness with Oke’s analysis with respect to the sample size at very small populations/small size. My goal in this study is to enlarge the sample of smaller sized cities and towns and refine the estimate in much the same way that Peng focused his study on cities over 1M in population. My working hypothesis is that just as there is a plateau of SUHI above a certain size that there may be a plataeu at smaller sizes, suggesting that a logistic relationship might be a better representation of the underlying process.

## Data

Sites were selected for study by utilizing MODIS ( MOD12Q1 2005) (15 arc second) Landcover data. The 17 class landcover product was reprojected from its native format into a WGS84 GeoTiff using nearest neighbor filtering and with a 15 arc second ( 500 meter) output format. The landcover data was reclassifed into urban and non urban pixels and a cluster analysis was performed to create patches of continguous urban pixels. Various size patches, from 10 to 100sq km, of urban areas were then selected for further processing. In addition, Census gazetter information was used to select locations of even smaller size from .5sqkm to 10 sq km. A 1km Digital Elevation model derived from 90m SRTM data was utilized to compute elevations, slopes and aspects. Finally, 8 day 1km Day LST and Night LST from MODIS was used. For this pilot study one 8 day period from July 4th 2006 was utilized ( MOD11A2 v 5 ) . The LST product was reprojected from its native format into a WGS 84 Geotiff using nearest neighbor filtering and a 30 arc second (~1km) output format.

## Methods

Census gazetter information was used to select a sample of small towns and CDPs that had areas less than 10 sq km and no substantial water area within the areas boundaries. That list was culled further using NLCD 2006 impervious area and Nightlights derived ISA to ensure that the small towns were at least 10km from larger urban areas. Urban areas larger than 10 sqkm were selected by created an urban/ rural mask from MODIS landcover. The mask was processed by finding patches of urban areas using a queens case clustering approach. The urban areas were sorting according to size and then classified according to their isolation from other urban areas. The goal of this processing was to identify urban areas that were surrounded predominately by rural pixels. A 20 km boundary zone was established around every urban patch and to qualify for selection 99% of the pixels outside the urban zone had to be classified as rural by MODIS landcover. This selection process resulted in a database of 99 sites. Given what I’ve learned in this process I will probably loosen the restriction regarding “satellite” urban pixels surrounding the urban core and include more sites in the final analysis.

The selected sites where then processed as follows. For every location a distance map was created for a 100km radius around the site center. The distance map provided the distance that every rural pixel was from the closest urban pixel. In the end, analysis showed that the distance from an urban pixel was not a significant driver in the land surface temperature of the rural pixel. With regards to Surface Air Temperatur (SAT) and UHI measures, it is likely that distance from the nearest urban pixel would be significant due to horizontal advection, but with Land Surface Temperture and SUHI, the physical ability of heated land surface in the urban area to influence the LST in adjacent rural cells is apparently limited. In the final analysis, nevertheless, I will probably add a screen to remove rural pixels that are within 2km of a urban pixel.

After distance maps are created out to 100km, the elevation data is processed within the same radius. In addition to retrieving the elevation for each cell the slope and aspect of the cell is generated. Zhang and Imhoff (2010) utilized elevation as screen for selecting both urban and rural pixels. Their screen required that a pixel be with a +-50 meter window of the mean urban pixel elevation. As the results below show elevation does have an impact on LST so that before comparing urban to rural pixels some form of normalization must be used. In addition to elevation, however, the slope and aspect of a grid cell can influence the LST. In examining differences between the ASTER LST product and the MODIS LST product Liu (2009) pointed to terrain effects such as slope as part of the reason for differences between these products. The difference in slopes between urban pixels and rural pixels is evident and generally speaking urban pixels tend to have lower slopes. The difference in slope, of course, will impact the amount of direct solar radiation that a given cell “sees” and the interaction between the slope and the view angle of the sensor apparently impacts the emissions sensed. The slope effect is small but statistically significant. For this analysis no screen was employed, however a screen of less than 1 degree slope will eliminate only 15% of urban pixels and 20 percent of rural pixels. A screen at .5 degrees eliminates 35% of all urban pixels and 40% of all rural pixels.

After terrain processing the Day LST and Night LST is retrieved for every pixel within 20km of the site center. As this radius is less than the 100km radius used for distance map and terrain processing we are assured of having good values for both the distance map and the slope and aspect maps at the 20km boundary. The cell centers of the LST product are then used to retrieve the corresponding distance map values, landcover values, and terrain values.

## Results

Zhang and Imhoff(2010) approached the problem estimating SUHI by classifying the urban pixels using ISA and the rural pixels using Olson Biomes. The approach I take is slightly different. The temperature product MODIS LST which we both use is a product that estimates the LST based on other MODIS products. Most importantly the split window algorithm uses estimates of emmisivity that are derived from Modis landcover product. Such that if MODIS Landcover believes the pixel is urban or croplands it does not matter what the ISA product thinks the pixel is and it doesnt matter how Olsen classified the pixel. Modis Landcover is used in the processing chain to to determine the LST and is the more appropriate classification tool. What we also see here is that Urban/Rural differences are to some extent built into the results. If Modis Landcover classifies a pixel as urban that classification will drive the emissivity estimate which in turn drives the output of the split window alogorithm employed to determine LST.

A box plot of Daytime LST by landcover type shows the range of temperature differences across the various land classes. It should be noted that this sample has not been screened for elevation, slope, or latitude and includes all pixels processed for the 99 sites

The x axis represents the IGBP code for land cover class, but its instructive to ask if the urban class is identifiable on inspection. If you think that class 7 is urban you are wrong. class 7 is open shrublands. Class 10 is grasslands. Urban is class 13. Below is the nighttime LST

What both figures illustrate is that the differential between Land surface temperature in the urban setting and Land surface temperature in the non urban setting is not single number. The implication for UHI studies is clear. The difference in temperature between a urban area and a rural area depends as much upon the rural setting as it does upon the urban setting. It is possible to select areas, such as grasslands and open shrubland that are both rural and warmer than the urban setting. It is also possible to amplify the difference by selecting other classes such as wetlands ( class 11 ) which are at the cooler end of rural landscapes. And its possible to get confusing answers by selecting croplands or mixed croplands ( class 12 and 14 ) which have high variability where some exhibit warmer LST and some exhibit cooler LST than urban areas.

At this stage with only 8 days of temperature there is not enough data to draw any conclusions. So the next step will be to download an entire year of LST daily data to construct seasonal averages for these sites. One thing we can do as a sort of sanity check is some simple regression on the entire dataset. Below, we can see that 40% of the variation is explained by looking at the urban area ( which lumps all rural pixels together with and area of zero ), elevation, latitude, and slope.

Call:

lm(formula = Sites$Day ~ Sites$UrbanArea + Sites$Elevation +

Sites$y + Sites$Slope)

Residuals:

Min 1Q Median 3Q Max

-34.419 -3.099 -0.543 2.947 20.985

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 3.196e+02 8.395e-02 3806.62 <2e-16 ***

Sites$UrbanArea 5.640e-02 2.046e-03 27.57 <2e-16 ***

Sites$Elevation 7.864e-03 1.941e-05 405.24 <2e-16 ***

Sites$y -4.809e-01 2.140e-03 -224.73 <2e-16 ***

Sites$Slope -7.718e-01 5.962e-03 -129.44 <2e-16 ***

—

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 5.202 on 272367 degrees of freedom

(12251 observations deleted due to missingness)

Multiple R-squared: 0.4448, Adjusted R-squared: 0.4448

F-statistic: 5.456e+04 on 4 and 272367 DF, p-value: < 2.2e-16

> model <- lm(Sites$Night~Sites$UrbanArea+Sites$Elevation+Sites$y+Sites$Slope)

> summary(model)

Call:

lm(formula = Sites$Night ~ Sites$UrbanArea + Sites$Elevation +

Sites$y + Sites$Slope)

Residuals:

Min 1Q Median 3Q Max

-23.2246 -1.3937 0.0242 1.5774 7.7279

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 3.080e+02 3.670e-02 8391.708 < 2e-16 ***

Sites$UrbanArea 2.634e-02 8.862e-04 29.724 < 2e-16 ***

Sites$Elevation -1.542e-03 8.662e-06 -178.006 < 2e-16 ***

Sites$y -4.316e-01 9.388e-04 -459.731 < 2e-16 ***

Sites$Slope -1.695e-02 2.667e-03 -6.355 2.08e-10 ***

—

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.324 on 281416 degrees of freedom

(3202 observations deleted due to missingness)

Multiple R-squared: 0.483, Adjusted R-squared: 0.483

F-statistic: 6.573e+04 on 4 and 281416 DF, p-value: < 2.2e-16

What we see here is that Urban Area has a positive coefficient of roughly .056 for daytime LST and a nighttime coefficient of roughly .026. This approach is different than the approach used by Zhang and Imhoff in that rather than screening for the same elevation, I’ve accounted for the elevation and slope via regression. Of course, I’ll check it both ways. Finally, The estimate for small town UHI provided by Imhoff was on the order of 1.75C for a town of size 1 sq km. In their study they found that UHI was fit by 3.48*log(area) + 1.75. That particular study had only two datapoints for towns at 1km. Looking at the data for the twons with urban areas less than 2km, the figure of 1.75C for Day SUHI can’t be supported. The difference in Day LST at that size is effectively zero.

## Conclusion

Now that I have the processing chains down to the last few bits I need to push forward on several bits.

1. More small area sites. The restriction I imposed, that 99% of all surounding pixels in a 20 km radius limits the number of small sites that can be found. This restriction was motivated by a concern that distance from an urban pixel might influence the LST. With a suitable buffer around each site I should be able to process more than the 30 or so small sites I have. Also, elevation screening and slope screening also reduces the number of sites that are usable in the end, so I’ll need to start with more sites.

2. Increase the sampling out to 50 sq km. Preliminary analysis indicated that there may be a break in SHUI around the 10 sq km mark. Although I had limited data ( 8 days in july when the signal should be high ) The UHI signal appeared to show up more clearly at around 10 sq km, So I should probably focus not only on the sub 10 sq km region, but out to 50. Beyond 50 sq km, my results were pretty much in line with Imhoff, despite the differences in our methods.

3. Eliminate water pixels and wetlands. I believe Imhoff eliminates water pixels. There are a bunch of ways that the data can be cut as one can see from the land class figures.

4. Regress on SUHI directly. This entails processing each site separately. Constructing an average for the small town ( which can be 1-10 pixels) and then construct a rural average across all land classes or some reduced set of land classes. In this case one would want to screen for altitude and slope and perhaps regress with respect to latitude to see of the there is a latitude component. One difficult of course here is that you are averging over a small number of urban pixels and then comapring that to an average over a large number of rural pixels. Kinda messy.

**leave a comment**for the author, please follow the link and comment on their blog:

**Steven Mosher's Blog**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...