Mixed models for ANOVA designs with one observation per unit of observation and cell of the design

[This article was first published on Computational Psychology - Henrik Singmann, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Together with David Kellen I am currently working on an introductory chapter to mixed models for a book edited by Dan Spieler and Eric Schumacher (the current version can be found here). The goal is to provide a theoretical and practical introduction that is targeted mainly at experimental psychologists, neuroscientists, and others working with experimental designs and human data. The practical part focuses obviously on R, specifically on lme4 and afex.

One part of the chapter was supposed to deal with designs that cannot be estimated with the maximal random effects structure justified by the design because there is only one observation per participant and cell of the design. Such designs are the classical repeated-measures ANOVA design as ANOVA cannot deal with replicates at the cell levels (i.e., those are usually aggregated to yield one observation per cell and unit of observation). Based on my previous thoughts that turned out to be wrong we wrote the following:

Random Effects Structures for Traditional ANOVA Designs

The estimation of the maximal model is not possible when there is only one observation per participant and cell of a repeated-measures design. These designs are typically analyzed using a repeated-measures ANOVA. Currently, there are no clear guidelines on how to proceed in such situations, but we will try to provide some advice. If there is only a single random effects grouping factor, for example participants, we feel that instead of a mixed model, it is appropriate to use a standard repeated-measures ANOVA that addresses sphericity violations via the Greenhouse-Geisser correction.

One alternative strategy that employs mixed models and that we \emph{do not recommend} consists of using the random-intercept only model or removing the random slopes for the highest within-subject interaction. The resulting model assumes invariance of the omitted random effects across participants. If this assumption is violated such a model produces results that cannot be trusted . […]

Fortunately, we asked Jake Westfall to take a look at the chapter and Jake responded:

I don’t think I agree with this. In the situation you describe, where we have a single random factor in a balanced ANOVA-like design with 1 observation per unit per cell, personally I am a proponent of the omit-the-the-highest-level-random-interaction approach. In this kind of design, the random slopes for the highest-level interaction are perfectly confounded with the trial-level error term (in more technical language, the model is only identifiable up to the sum of these two variance components), which is what causes the identifiability problems when one tries to estimate the full maximal model there. (You know all of this of course.) So two equivalent ways to make the model identifiable are to (1) omit the error term, i.e., force the residual variance to be 0, or (2) omit the random slopes for the highest-level interaction. Both of these approaches should (AFAIK) result in a statistically equivalent model, but lme4 does not provide an easy way to do (1), so I generally recommend (2). The important point here is that the standard errors should still be correct in either case — because these two variance components are confounded, omitting e.g. the random interaction slopes simply causes that omitted variance component to be implicitly added to the residual variance, where it is still incorporated into the standard errors of the fixed effects in the appropriate way (because the standard error of the fixed interaction looks roughly like sqrt[(var_error + var_interaction)/n_subjects]). I think one could pretty easily put together a little simulation that would demonstrate this.

Hmm, that sounds very reasonable, but can my intuition on the random effects structure and mixed models really be that wrong? To investigate this I followed Jake’s advise and coded a short simulation that tested this and as it turns out, Jake is right and I was wrong.

In the simulation we will simulate a simple one-factor repeated-measures design with one factor with three levels. Importantly, each unit of observation will only have one observation per factor level. We will then fit this simulated data with both repeated-measures ANOVA and random-intercept only mixed and compare their p-values. Note again that for such a design we cannot estimate random slopes for the condition effect.

First, we need a few packages and set some parameters for our simulation:

require(afex)
set_sum_contrasts() # for orthogonal sum-to-zero contrasts
require(MASS) 

NSIM <- 1e4  # number of simulated data sets
NPAR <- 30  # number of participants per cell
NCELLS <- 3  # number of cells (i.e., groups)

Now we need to generate the data. For this I employed an approach that is clearly not the most parsimonious, but most clearly follows the formulation of a mixed model that has random variability in the condition effect and on top of this residual variance (i.e., the two confounded factors).

We first create a bare bone data.frame with participant id and condition column and a corresponding model.matrix. Then we create the three random parameters (i.e., intercept and the two parameters for the three conditions) using a zero-centered multivarite normal with specified variance-covariance matrix. We then loop over the participant and estimate the predictions deriving from the three random effects parameters. Only after this we add uncorrelated residual variance to the observations for each simulated data set.

dat <- expand.grid(condition = factor(letters[seq_len(NCELLS)]),
                   id = factor(seq_len(NPAR)))
head(dat)
#   condition id
# 1         a  1
# 2         b  1
# 3         c  1
# 4         a  2
# 5         b  2
# 6         c  2

mm <- model.matrix(~condition, dat)
head(mm)
#   (Intercept) condition1 condition2
# 1           1          1          0
# 2           1          0          1
# 3           1         -1         -1
# 4           1          1          0
# 5           1          0          1
# 6           1         -1         -1

Sigma_c_1 <- matrix(0.6, NCELLS,NCELLS)
diag(Sigma_c_1) <- 1
d_c_1 <- replicate(NSIM, mvrnorm(NPAR, rep(0, NCELLS), Sigma_c_1), simplify = FALSE)

gen_dat <- vector("list", NSIM)
for(i in seq_len(NSIM)) {
  gen_dat[[i]] <- dat
  gen_dat[[i]]$dv <- NA_real_
  for (j in seq_len(NPAR)) {
    gen_dat[[i]][(j-1)*3+(1:3),"dv"] <- mm[1:3,] %*% d_c_1[[i]][j,]
  }
  gen_dat[[i]]$dv <- gen_dat[[i]]$dv+rnorm(nrow(mm), 0, 1)
}

Now we only need a function that estimates the ANOVA and mixed model for each data set and returns the p-value and loop over it.

## functions returning p-value for ANOVA and mixed model
within_anova <- function(data) {
  suppressWarnings(suppressMessages(
  a <- aov_ez(id = "id", dv = "dv", data, within = "condition", return = "univariate", anova_table = list(es = "none"))
  ))
  c(without = a[["univariate.tests"]][2,6],
    gg = a[["pval.adjustments"]][1,2],
    hf = a[["pval.adjustments"]][1,4])
}

within_mixed <- function(data) {
  suppressWarnings(
    m <- mixed(dv~condition+(1|id),data, progress = FALSE)  
  )
  c(mixed=anova(m)$`Pr(>F)`)
}

p_c1_within <- vapply(gen_dat, within_anova, rep(0.0, 3))
m_c1_within <- vapply(gen_dat, within_mixed, 0.0)

The following graph shows the results (GG is the results using the Greenhouse-Geisser adjustment for sphericity violations).

ylim <- c(0, 700)
par(mfrow = c(1,3))
hist(p_c1_within[1,], breaks = 20, main = "ANOVA (default)", xlab = "p-value", ylim=ylim)
hist(p_c1_within[2,], breaks = 20, main = "ANOVA (GG)", xlab = "p-value", ylim=ylim)
hist(m_c1_within, breaks = 20, main = "Random-Intercept Model", xlab = "p-value", ylim=ylim)

What these graph clearly shows is that the p-value distribution for the standard repeated-measures ANOVA and the random-intercept mixed model is virtually identical. This clearly shows that my intuition was wrong and Jake was right.

We also see that for ANOVA and mixed model the rate of significant findings with p < .05 is slightly above the nominal level. More specifically:

mean(p_c1_within[1,] < 0.05) # ANOVA default
# [1] 0.0684
mean(p_c1_within[2,] < 0.05) # ANOVA GG
# [1] 0.0529
mean(p_c1_within[3,] < 0.05) # ANOVA HF
# [1] 0.0549
mean(m_c1_within < 0.05)     # random-intercept mixed model
# [1] 0.0701

These additional results indicate that maybe one also needs to adjust the degrees of freedom for mixed models for violations of sphericity. But this is not the topic of today's post.

To sum this up, this simulation shows that removing the highest-order random slope seems to be the right decision if one wants to use a mixed model for a design with one observation per cell of the design and participant, but wants to implement the 'maximal random effects structure'.

One more thing to note. Ben Bolker raised the same issue and pointed us to one of his example analyses of the starling data that is relevant to the current question. We are very grateful that Jake and Ben took the time to go through our chapter!

You can also download the RMarkdown file of the simulation.

References

To leave a comment for the author, please follow the link and comment on their blog: Computational Psychology - Henrik Singmann.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)