Power Analysis for mixed-effect models in R

[This article was first published on Computational Ecology, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The power of a statistical test is the probability that a null
hypothesis will be rejected when the alternative hypothesis is true.
In lay terms, power is your ability to refine or “prove” your
expectations from the data you collect. The most frequent motivation
for estimating the power of a study is to figure out what sample size
will be needed to observe a treatment effect. Given a set of pilot
data or some other estimate of the variation in a sample, we can use
power analysis to inform how much additional data we should collect.

I recently did a power analysis on a set of pilot data for a long-term
monitoring study of the US National Park Service. I thought I would
share some of the things I learned and a bit of R code for others that
might need to do something like this. If you aren’t into power
analysis, the code below may still be useful as examples of how to use
the error handling functions in R (withCallingHandlers,
withRestarts), parallel programming using the snow
package, and linear mixed effect regression using nlme. If you
have any suggestions for improvement or if I got something wrong on
the analysis, I’d love to hear from you.

1 The Study

The study system was cobblebars along the Cumberland river in Big
South Fork National Park (Kentucky and Tennessee, United States).
Cobblebars are typically dominated by grassy vegetation that include
disjunct tall-grass prairie species. It is hypothesized that woody
species will encroach onto cobblebars if they are not seasonally
scoured by floods. The purpose of the NPS sampling was to observe
changes in woody cover through time. The study design consisted of
two-stages of clustering: the first being cobblebars, and the second
being transects within cobblebars. The response variable was the
percentage of the transect that was woody vegetation. Because of
the clustered design, the inferential model for this study design
has mixed-effects. I used a simple varying intercept

where y is the percent of each transect in woody vegetation sampled
n times within J cobblebars, each with K transects. The
parameter of inference for the purpose of monitoring change in woody
vegetation through time is β, the rate at which cover changes as
a function of time. α, γ, σ2γ, and
σ2y are hyper-parameters that describe the hierarchical
variance structure inherent in the clustered sampling design.

Below is the function code used I used to regress the pilot data. It
should be noted that with this function you can log or logit transform
the response variable (percentage of transect that is woody). I put
this in because the responses are proportions (0,1) and errors should
technically follow a beta-distribution. Log and logit transforms with
Gaussian errors could approximate this. I ran all the models with
transformed and untransformed response, and the results did not vary
at all. So, I stuck with untransformed responses:

Model <- function(x = cobblebars,
  type = c("normal","log","logit")){
  ## Transforms
  if (type[1] == "log")
    x$prop.woody <- log(x$prop.woody)
  else if (type[1] == "logit")
    x$prop.woody <- log(x$prop.woody / (1 - x$prop.woody))

  mod <- lme(prop.woody ~ year,
             data = x,
             random = ~ 1 | cobblebar/transect,
             na.action = na.omit,
             control = lmeControl(opt = "optim",
               maxIter = 800, msMaxIter = 800)
  mod$type <- type[1]


Here are the results from this regression of the pilot data:

Linear mixed-effects model fit by REML
 Data: x 
        AIC       BIC   logLik
  -134.4319 -124.1297 72.21595

Random effects:
 Formula: ~1 | cobblebar
StdDev:  0.03668416

 Formula: ~1 | transect %in% cobblebar
        (Intercept)   Residual
StdDev:  0.02625062 0.05663784

Fixed effects: prop.woody ~ year 
                  Value  Std.Error DF   t-value p-value
(Intercept)  0.12966667 0.01881983 29  6.889896  0.0000
year        -0.00704598 0.01462383 29 -0.481815  0.6336
year -0.389

Number of Observations: 60
Number of Groups: 
              cobblebar transect %in% cobblebar 
                      6                      30 

2 We don't learn about power analysis and complex models

When I decided upon the inferential model the first thing that
occurred to me was that I never learned in any statistics course I
had taken how to do such a power analysis on a multi-level model.
I've taken more statistics courses than I'd like to count and taught
my own statistics courses for undergrads and graduate students, and
the only exposure to power analysis that I had was in the context of
simple t-tests or ANOVA. You learn about it in your first 2
statistics courses, then it rarely if ever comes up again until you
actually need it.

I was, however, able to find a great resource on power analysis from
a Bayesian perspective in the excellent book "Data Analysis Using
Regression and Multilevel/Hierarchical Models" by Andrew Gelman and
Jennifer Hill. Andrew Gelman has thought and debated about power
analysis and you can get more from his blog. The approach in the
book is a simulation-based one and I have adopted it for this

3 Analysis Procedure

For the current analysis we needed to know three things: effect
size, sample size, and estimates of population variance. We set
effect size beforehand. In this context, the parameter of interest
is the rate of change in woody cover through time β, and
effect size is simply how large or small a value of β you want
to distinguish with a regression. Sample size is also set a priori. In the analysis we want to vary sample size by varying the
number of cobblebars, the number of transects per cobblebar or the
number of years the study is conducted.

The population variance cannot be known precisely, and this is where
the pilot data come in. By regressing the pilot data using the
model we can obtain estimates of all the different components of the
variance (cobblebars, transects within cobblebars, and the residual
variance). Below is the R function that will return all the
hyperparameters (and β) from the regression:

  ## Get the hyperparameters from the mixed effect model
  fe <- fixef(x)
    b<-fe[2] # use the data effect size if not supplied

  mu.a <- fe[1] 

  vc <- VarCorr(x)
  sigma.y <- as.numeric(vc[5, 2]) # Residual StdDev
  sigma.a <- as.numeric(vc[2, 2]) # Cobblebar StdDev
  sigma.g <- as.numeric(vc[4, 2]) # Cobblebar:transect StdDev

  hp<-c(b, mu.a, sigma.y, sigma.a, sigma.g)
  names(hp)<-c("b", "mu.a", "sigma.y", "sigma.a", "sigma.g")

To calculate power we to regress the simulated data in the same way we
did the pilot data, and check for a significant β. Since
optimization is done using numeric methods there is always the chance
that the optimization will not work. So, we make sure the regression
on the fake data catches and recovers from all errors. The solution
for error recovery is to simply try the regression on a new set of
fake data. This function is a pretty good example of using the R
error handling function withCallingHandlers and

fakeModWithRestarts <- function(m.o, n = 100,  ...){
  ## A Fake Model
    i <- 0
    mod <- NULL
    while (i < n & is.null(mod)){
      mod <- withRestarts({
        f <- fake(m.orig = m.o, transform = F, ...)
        return(update(m.o, data = f))
      rs = function(){
        i <<- i + 1
  error = function(e){
  warning = function(w){
    if(w$message == "ExceededIterations")
      cat("\n", w$message, "\n")

To calculate the power of a particular design we run
fakeModWithRestarts 1000 times and look at the proportion of
significant β values:

dt.power <- function (m, n.sims = 1000, alpha=0.05, ...){
  ## Calculate power for a particular sampling design
  signif<-rep(NA, n.sims)
  for(i in 1:n.sims){
    lme.power <- fakeModWithRestarts(m.o = m, ...)
      signif[i] <- summary(lme.power)$tTable[2, 5] < alpha
  power <- mean(signif, na.rm = T)

Finally, we want to perform this analysis on many different sampling
designs. In my case I did all combinations of set of effect sizes,
cobblebars, transects, and years. So, I generated the appropriate designs:

factoredDesign <- function(Elevs = 0.2/c(1,5,10,20),
                           Nlevs = seq(2, 10, by = 2),
                           Jlevs = seq(4, 10, by = 2),
                           Klevs = c(3, 5, 7), ...){
  ## Generates factored series of sampling designs for simulation
  ## of data that follow a particular model.
  ## Inputs:
  ##   Elevs - vector of effect sizes for the slope parameter.
  ##   Nlevs - vector of number of years to sample.
  ##   Jlevs - vector of number of cobblebars to sample.
  ##   Klevs - vector of number of transects to sample.
  ## Results:
  ##   Data frame with where columns are the factors and
  ##   rows are the designs.

  # Level lengths
  lE <- length(Elevs)
  lN <- length(Nlevs)
  lJ <- length(Jlevs)
  lK <- length(Klevs)

  # Generate repeated vectors for each factor
  E <- rep(Elevs, each = lN*lJ*lK)
  N <- rep(rep(Nlevs, each = lJ*lK), times = lE)
  J <- rep(rep(Jlevs, each = lK), times = lE*lN)
  K <- rep(Klevs, times = lE*lN*lJ)
  return(data.frame(E, N, J, K))

Once we know our effect sizes, the different sample sizes we want,
and the estimates of population variance we can generate simulated
dataset that are similar to the pilot data. To calculate power we
simply simulate a large number of dataset and calculate the
proportion of slopes, β that are significantly different from
zero (p-value < 0.05). This procedure is repeated for all the effect sizes and sample sizes of interest. Here is the code for generating a simulated dataset. It also does the work of doing the inverse transform of the response variables if necessary.

fake <- function(N = 2, J = 6, K = 5, b = NULL, m.orig = mod,
                 transform = TRUE, ...){
  ## Simulated Data for power analysis
  ## N = Number of years
  ## J = Number of cobblebars
  ## K = Number of transects within cobblebars
  year <- rep(0:(N-1), each = J*K)
  cobblebar <- factor(rep(rep(1:J, each = K), times = N))
  transect <- factor(rep(1:K, times = N*J))

  ## Simulated parameters
    b <- hp['b']
  g <- rnorm(J*K, 0, hp['sigma.g'])
  a <- rnorm(J*K, hp['mu.a'] + g, hp['sigma.a'])
  ## Simulated responses
  eta <- rnorm(J*K*N, a + b * year, hp['sigma.y'])
  if (transform){
    if (m.orig$type == "normal"){
      y <- eta
      y[y > 1] <- 1 # Fix any boundary problems.
      y[y < 0] <- 0
    else if (m.orig$type == "log"){
      y <- exp(eta)
      y[y > 1] <- 1
    else if (m.orig$type == "logit")
      y <- exp(eta) / (1 + exp(eta))
    y <- eta
  return(data.frame(prop.woody = y, year, transect, cobblebar))

Then I performed the power calculations on each of these designs. This
could take a long time, so I set this procedure to use parallel processing
if needed. Note that I had to re-~source~ the file with all the
necessary functions for each processor.

powerAnalysis <- function(parallel = T, ...){
  ## Full Power Analysis
  ## Parallel
    cl <- makeCluster(7, type = "SOCK")
    clusterEvalQ(cl, source("cobblebars2.r"))
  ## The simulations
  dat <- factoredDesign(...)

  if (parallel){
    dat$power <- parRapply(cl, dat, function(x,...){
      dt.power(N = x[2], J = x[3], K = x[4], b = x[1], ...)
    }, ...)
  } else {
    dat$power <- apply(dat, 1, function(x, ...){
      dt.power(N = x[2], J = x[3], K = x[4], b = x[1], ...)
    }, ...)


The output of the powerAnalysis function is a data frame with
columns for the power and all the sample design settings. So, I wrote
a custom plotting function for this data frame:

plotPower <- function(dt){
  xyplot(power~N|J*K, data = dt, groups = E,
         panel = function(...){panel.xyplot(...)
                               panel.abline(h = 0.8, lty = 2)},
         type = c("p", "l"),
         xlab = "sampling years",
         ylab = "power",
         strip = strip.custom(var.name = c("C", "T"),
           strip.levels = c(T, T)),
         auto.key = T

Below is the figure for the cobblebar power analysis. I won't go into
detail on what the results mean since I am concerned here with
illustrating the technique and the R code. Obviously, as the number of
cobblebars and transects per year increase, so does power. And, as
the effect size increases, observing it with a test is easier.

Author: Todd Jobe

Date: 2009-09-18 Fri

HTML generated by org-mode 6.30trans in emacs 22

To leave a comment for the author, please follow the link and comment on their blog: Computational Ecology.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)