# Cherry Picking to Generalize ~ NASA Global Temperature Trends

April 6, 2010
By

(This article was first published on mind of a Markov chain » R, and kindly contributed to R-bloggers)

The relatively (to this decade) cool 2008 global temperatures spurred talks of a warming pause, or even global cooling. The claim usually comes from people who cherry picked either data sets and(!)/or start and end points of the global temperature trends to back up their allegation. The blogosphere already has a lot on this:

1. Skeptical Science summarizes short-term temperature trends from peer-reviewed research.
2. Tamino has two posts on linear temp. trends (Garbage is Forever, and Wiggles).
3. Climate Charts and Graphs has a number of graphs on temperature trends.
4. Learning R does a ggplot2 graph of decadal trend rates.
5. Real Climate has an article on the alleged warming pause.
6. Chip Knappenberger of MasterResource calculates recent linear trends and compares among datasets.

Basically, you really need to cherry pick to show a declining, or even a neutral trend. Below is the most recent data taken from NASA GISS:

We see generally an increasing trend, and it flat lines at the end. Depending on where you pick your end points, it may not be too out of the ordinary to claim global cooling. I decided to really take this exercise further, to cherry pick starting and ending years of temperature trends for a large portion of the data.

What I did was to calculate simple linear regressions (just a plain lm() in R) for virtually all combinations of starting and ending points between 1970 and 2009 (where the trend is more or less linear). I chose a a minimum lag (min.lag) to “cushion” the regression so it’s actually meaningful. I then extracted the estimated slope parameter (Celsius/yr), plot them in a 2D grid (of start and end years).

I chose a lag of 4 years (anything smaller makes things a little unstable but still works). That means I have (2009-1970-4+1)^2/2 = 648 linear regressions to calculate. Even though each of the regressions shouldn’t determine a trend, the aggregate of it should. The grids are calibrated so positive temp. trends are red, and negative temp. trends are blue. The x-axis is the start year, while the y-axis is the end year. The bottom right triangle portion of this matrix was forced to be the negative of the maximum value so R can get the color right.

We see a lot of red especially for longer lags. There are a few trends that go down for really short lags (along the diagonal), but even then it is not a common occurrence. The temperature has increased by something like 0.13 ℃ / decade for the last 50 years or so, we do generally see that except for latest short trends, which are obviously noisy. It looks like, at least from 1970 on, there are times of decline during short periods but flattens out over time. The most recent data show a slight decline.

The plot uses the fields package to attach the color legend to the grid. The code is below (there has to be a better way to rotate axis labels…):

 library(fields)

# from NASA GISS
# http://data.giss.nasa.gov/gistemp/graphs/Fig.A2.txt
GISTEMP <- read.csv("Global Annual Mean Surface Air Temperature Change.csv")

#   Year Annual_Mean X5_Year_Mean
# 1 1880       -0.28           NA
# 2 1881       -0.21           NA

# take lm() of starting point vs. end point back to 1950
begin = 1970
finish = 2009
min.lag = 8 # minimum lag years between begin and finish

lm.len <- length(begin:finish) - min.lag
lm.yr <- subset(GISTEMP, (Year >= begin) & (Year <= finish), select = Year)
lm.dex <- which(GISTEMP[,1] == begin):which(GISTEMP[,1] == finish)
lm.temp <- matrix(NA, lm.len, lm.len) # slope of temp
rownames(lm.temp) <- begin:(finish-min.lag)
colnames(lm.temp) <- (begin+min.lag):finish

for (i in 1:lm.len) {
for (j in (i + min.lag):(lm.len+min.lag)) {
lm.dat <- lm(GISTEMP[lm.dex[i:j],2] ~ GISTEMP[lm.dex[i:j],1])
lm.temp[i,(j-min.lag)] <- lm.dat\$coeff[2] # slope parameter!
}
}

lm.colorCut = 100
lm.color = rev(c(rgb(1,1:lm.colorCut/lm.colorCut,1:lm.colorCut/lm.colorCut), rgb(1,1,1),
rgb(lm.colorCut:1/lm.colorCut,lm.colorCut:1/lm.colorCut,1))) # full red for high values, to full blue for low values

# to adjust for the color scale, make white as slope = 0
lm.temp2 <- replace(lm.temp, is.na(lm.temp), -max(as.vector(lm.temp), na.rm = T))

image.plot(x = begin:(finish-min.lag), y = (begin+min.lag):finish, z = lm.temp2, xlab = "", ylab = "",
axes = F, col = lm.color, main = paste("NASA Global Temp Slope (Celsius/year) with Lag =", min.lag, "years"))
grid(lm.len, lm.len, col = "black")
axis(1, at = begin:finish, labels = F)
text(begin:(finish-min.lag), begin + min.lag, srt = 90, adj = 2,
labels = begin:(finish-min.lag), cex = .8, xpd = T)
axis(2, at = begin:finish, labels = F)
text(begin, (begin+min.lag):finish, srt = 0, adj = 2,
labels = (begin+min.lag):finish, cex = .8, xpd = T)
mtext("Start Year", 1, 3)
mtext("End Year", 2, 3)

text(1990, 1983, "**note: bottom right triangle n was only colored blue n
to calibrate the color scale", col = "white")

Filed under: Climate Change, R

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...