Cherry Picking to Generalize ~ retrospective meta-power analysis using Cohen’s f^2 of NASA temp + visualization

Posted on July 17, 2010 by apeescape in R bloggers | 0 Comments

[This article was first published on mind of a Markov chain » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Previously, I plotted a grid of NASA GISS global temps in ggplot2 to show general trends by the brute force method. Here, I will again use the brute force method to do a simple power analysis on a portion of the data (data here). The general aim is to figure out what the minimum sample size is needed to show strong confidence in the significance of the trend (hopefully in a compact but comprehensive way).

There are many blog posts on the trend since 1998, where some people call an insignificant (too short of a) trend, while others say it is a sign for a new trend. Let’s put some numbers by doing a power analysis, assuming 0.8 is good power and 0.6 is OK.

Power is the probability of correctly assessing significant effects relative to the null distribution (hypothesis). Correctly assessing significant effects means that the estimated parameter of interest is inside the rejection region. When the parameter is outside of the rejection region, we call it Type I Error.

We usually denote $alpha$ as the significance level to choose a threshold between rejection and acceptance of the null. This means that the power is directly related to the rejection threshold. The lower (and easier) the threshold, the stronger the effect size needs to be to counteract.

The effect size is basically the strength of the relationship between the variables. This could be a standardized measures that are easier to calculate (getting rid of variance complications) or unstandardized where it’s easier to interpret (Thomas 1997). The effect size is related to power usually by using the F-distribution.

Aside from changing the experiment / observational study altogether, the prime way to strengthen effect size is to increase the sample size. Therefore, you see many plots that show power as a function of sample size.

Power is used prospectively to assess sample sizes that are needed to get significant results from experiments. It is also used retrospectively to assess our confidence in our results.

For this example, we use the post-hoc power analysis and a standardized measure of effect size (Cohen’s $f^2$ ). The idea is to extend (in an ad-hoc visualization) to compensate for some of its weaknesses for this relatively easy method. Time series issues are ignored for now, so hopefully people can look at this article with a charitable eye.

I used the pwr package in R to calculate Cohen’s f^2 (Quick-R has a good reference). Cohen’s f^2 requires the coefficient of determination R^2 of linear regression models:

$f^2 = frac{R^2}{1 - R^2}$

What I did was calculate a bunch of linear regressions of length 5 ~ 30, categorized by the final year of the time series. So for 2009 (“the latest year”), I calculated all the linear trends from 2004 ~ 2009 to 1979 ~ 2009. I calculated this series of series for a series from 1949 to 2009 (if that makes sense!).

One of the criticisms of Thomas (1997) is that $R^2$ measures (and other effect sizes) have high variability. Just calculating one $R^2$ of the whole dataset is not reliable, and it becomes just a restatement of the statistical significance of the data. But in this case, we reduce this variability (or more like show patterns) by bootstrapping over the whole dataset. These bootstraps are aggregated into 10-year intervals, which assume that these categories are related to each other (dubious, but hey).

The end result is a multiple power vs. sample size graphs plastered on the wall, aggregated by 10-year intervals:

This graph has 5 parts (aagh, info. overload!):

each individual point represents power of a series of sample size n
smoothed trends of the categories of 10-year trends
bootstrapped 95% (?) intervals of means at each value of sample size n
a yellow line to show the theoretical power for an effect size of 0.35 (considered large)
a plot within a plot that shows the time series, with the “latest years” shaded accordingly

The last point is there to show the direction of the significance, and its variability.

We see that the recent data has higher over all, and the first three intervals have generally higher powers than the theoretical values after a sample size of 15. The variations in the smoothed curves are only useful in relation to the other smoothed curves (doesn’t really represent the real variability in power).

The mean of the smoothed curves are useful to the point where linearity and autocorrelation assumptions aren’t violated. The first (most recent) category is relatively clean, while the third category breaks down after about the 25th sample size.

If I were to say if a 1998 to current trend was to have sufficient power, I would say (to be a little conservative), we should have data up to 2012.

Sensitivity can be assessed by using different aggregations.

For comparison, this is the graph w/o the aggregations:

R Code:

library(pwr)
library(ggplot2)

read.csv("./Global Annual Mean Surface Air Temperature Change.csv") -> giss
head(giss, 2)
##   Year Annual_Mean X5_Year_Mean
## 1 1880       -0.28           NA
## 2 1881       -0.21           NA

## calculate power for different sample sizes
n <- 130 # most recent year index
m <- 100 # oldest year index
f2 <- 0.35 # cohen's f2 0.02, 0.15, and 0.35 represent small, medium, and large effect sizes.
alpha <- 0.05

pwr.graph <- function(n, m, f2, alpha) {
  giss.n <- (m+4):n-m+1 # sample size
  giss.pwr <- cbind(giss.n, 0, 0, 0)
  for (i in giss.n - 4) {
    giss.data <- giss[(n-i-4+1):n,]
    giss.lm <- lm(Annual_Mean ~ Year, data = giss.data)
    giss.pwr[i,3] <- giss.r2 <- summary(giss.lm)$r.squared
    giss.pwr[i,2] <- pwr.f2.test(summary(giss.lm)$f[2], summary(giss.lm)$f[3],
                                 giss.r2 / (1 - giss.r2), alpha)$power
    giss.pwr[i,4] <- as.numeric(tail(giss.data, 1)[1])
  }
  giss.pwr
}

n <- 130
giss.pwr <- pwr.graph(n, n-30, f2, alpha)
for (n in rev(seq(70, n-1, by = 1))) {
  giss.pwr <- rbind(giss.pwr, pwr.graph(n,n-30,f2,alpha))
}
colnames(giss.pwr) <- c("Sample Size", "Power", "R^2", "Latest Year")

fac.name <- function(value, name) {
  x <- value
  names(x) <- name
  x
}

Latest.Year <- giss.pwr[,4]

num.cuts <- 7 # cut(unique(Latest.Year), num.cuts) should be whole numbers
giss.legend.name <- as.numeric(sub("[^,]*,([^]]*)\]", "\1", unique(cut(unique(Latest.Year), num.cuts))))
names(giss.legend.name) <- unique(cut(unique(Latest.Year), num.cuts))

p1 <- ggplot(aes(Sample.Size, Power), data = data.frame(giss.pwr)) +
  geom_point(aes(fill = cut(Latest.Year, num.cuts), col = cut(Latest.Year, num.cuts)), alpha = I(.2), size = I(3), position = "jitter") +
  stat_smooth(aes(fill = cut(Latest.Year, num.cuts), col = cut(Latest.Year, num.cuts)), fullrange = T) +
  stat_summary(aes(group = 1), fun.data = "mean_cl_boot", geom = "errorbar", color = "white", size = I(1.1)) +
  scale_colour_manual(name = "Latest Year", values = as.factor(fac.name(rgb(seq(1,0,len=num.cuts),0,0), names(giss.legend.name)))) + #, breaks = giss.legend.name, labels = names(giss.legend.name)) +
  scale_fill_manual(name = "Latest Year", values = as.factor(fac.name(rgb(seq(1,0,len=num.cuts),0,0), names(giss.legend.name)))) +#, breaks = giss.legend.name, labels = names(giss.legend.name))
  opts(title = expression(paste("GISS Temps Power Analysis of Trends: Effect Size Cohen's ", f^2))) +
  ylim(0,1.1) + xlab("Sample Size (subtract from Latest Year to get Start Year)")

p2 <- ggplot(aes(Year, Annual_Mean), data = giss) +
  scale_fill_gradient(low = "black", high = "red") +
  opts(panel.grid.minor = theme_line(colour = NA),
       panel.grid.major = theme_line(colour = NA),
       panel.border = theme_rect(fill = NA, colour = NA),
       panel.background = theme_rect(fill = NA, colour = NA),
       plot.background = theme_rect(fill = NA, colour = NA),
       legend.position = "none")

p2 <- p2 + geom_polygon(aes(c(Year, rev(Year)),
                            c(pmin(Annual_Mean, min(giss$Annual_Mean, na.rm = T), na.rm = T),
                              rev(pmax(Annual_Mean, max(giss$Annual_Mean, na.rm = T), na.rm = T))),
                            fill = Year,
                            group = c(cut(Year, num.cuts), rev(cut(Year, num.cuts)))),
                        subset = .(Year > min(giss.legend.name) - num.cuts),
                        alpha = I(.5)) + geom_point(col = I("white"))

# theoretical values w/ strong effect size .35
s <- 5:30 # sample sizes
s.p <- pwr.f2.test(1, s-2, .35, .05)$power
s.df <- data.frame(s, s.p)
p3 <- geom_line(aes(s, s.p), data = s.df, col = I("yellow"))

grid.newpage()
p1 + opts(panel.border = theme_blank()) + p3
print(p2, vp = viewport(just = c("left", "top"), x = unit(.05, "npc"), y = unit(.96, "npc"), width = .4, height = .3))

Refs: