Dealing with missing data in ANOVA models

[This article was first published on Simon Grund [R], and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The analysis of variance, or ANOVA, is among the most popular methods for analyzing how an outcome variable differs between groups, for example, in observational studies or in experiments with different conditions.

But how do we conduct the ANOVA when there are missing data? In this post, I show how to deal with missing data in between- and within-subject designs using multiple imputation (MI) in R.

The ANOVA model

In the one-factorial ANOVA, the goal is to investigate whether two or more groups differ with respect to some outcome variable \(y\). The statistical model can be written as

\[ \begin{equation} \label{model} y_{ij} = \mu_j + e_{ij} \; , \end{equation} \]

where \(y_{ij}\) denotes the value of \(y\) for person \(i\) in group \(j\), and \(\mu_j\) is the mean in group \(j\). The (omnibus) null hypothesis of the ANOVA states that all groups have identical population means. For three groups, this would mean that

\[ \begin{equation} \mu_1 = \mu_2 = \mu_3 \; . \end{equation} \]

This hypothesis is tested by looking at whether the differences between groups are larger than what could be expected from the differences within groups. If this is the case, then we reject the null, and the group means are said to be “significantly” different from one another.

In the following, we will look at how this hypothesis can be tested when the outcome variable contains missing data. Let’s illustrate this with an example.

Example 1: between-subjects ANOVA

For this example, I simulated some data according to a between-subject design with three groups, \(n\) = 50 subjects per group, and a “medium” effect size of \(f\) = .25, which roughly corresponds to an \(R^2=6.8\%\) (Cohen, 1988).

You can download the data from this post if you want to reproduce the results (CSV, Rdata). Here are the first few rows.

group y x
1 A -0.715 0.062
2 B 0.120 0.633
3 C 1.341 1.176
4 A NA -0.792
5 B 0.468 0.243

The three variables mean the following:

  • group: the grouping variable
  • y: the outcome variable (with 20.7% missing data)
  • x: an additional covariate

In this example, cases with lower values in x had a higher chance of missing data in y. Because x is also positively correlated with y, this means that smaller y values are missing more often than larger ones.

Listwise deletion

Lets see what happens if we run the ANOVA only with those cases that have y observed (i.e., listwise deletion). This is the standard setting on most statistical software.

In R, the ANOVA can be conducted with the lm() function as follows.

# fit the ANOVA model
fit1 <- lm(y ~ 1 + group, data = dat)
# Call:
# lm(formula = y ~ 1 + group, data = dat)
# Residuals:
#     Min      1Q  Median      3Q     Max 
# -1.8196 -0.6237 -0.0064  0.5657  2.1808 
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)
# (Intercept)   0.0944     0.1452    0.65     0.52
# groupB        0.1720     0.1972    0.87     0.38
# groupC       -0.1570     0.2082   -0.75     0.45
# Residual standard error: 0.895 on 116 degrees of freedom
#   (31 observations deleted due to missingness)
# Multiple R-squared:  0.0229,  Adjusted R-squared:  0.00609 
# F-statistic: 1.36 on 2 and 116 DF,  p-value: 0.26

In this example, the \(F\)-test at the bottom of the output indicates that the group means are not significantly different from one another, \(F(\!\) 2, 116 \(\!)\) = 1.361 (\(p\) = 0.26).1 In addition, the effect size (\(R^2\) = 0.023) is quite a bit smaller than what was used to generate the data.

In fact, this result is a direct consequence of how the missing data were simulated. Fortunately, there are statistical methods that can account for the missing data and help us obtain more trustworthy results.

Multiple imputation

One of the most effective ways of dealing with missing data is multiple imputation (MI). Using MI, we can create multiple plausible replacements of the missing data, given what we have observed and a statistical model (the imputation model).

In the ANOVA, using MI has the additional benefit that it allows taking covariates into account that are relevant for the missing data but not for the analysis. In this example, x is a direct cause of missing data in y. Therefore, we must take x into account when making inferences about y in the ANOVA.

Running MI consists of three steps. First, the missing data are imputed multiple times. Second, the imputed data sets are analyzed separately. Third, the parameter estimates and hypothesis tests are pooled to form a final set of estimates and inferences.

For this example, we will use the mice and mitml packages to conduct MI.


Specifying an imputation model is very simple here. With the following command, we generate 100 imputations for y on the basis of a regression model with both group and x as predictors and a normal error term.

# run MI
imp <- mice(data = dat, method = "norm", m = 100)

The imputed data sets can then be saved as a list, containing 100 copies of the original data, in which the missing data have been replaced by different imputations.

# create a list of completed data sets
implist <- mids2mitml.list(imp)

Finally, we fit the ANOVA model to each of the imputed data sets and pool the results. The analysis part is done with the with() command, which applies the same linear model, lm(), to each data set. The pooling es then done with the testEstimates() function.

# fit the ANOVA model
fit2 <- with(implist, lm(y ~ 1 + group))

# pool the parameter estimates
# Call:
# testEstimates(model = fit2)
# Final parameter estimates and inferences obtained from 100 imputed data sets.
#              Estimate Std.Error   t.value        df   P(>|t|)       RIV       FMI 
# (Intercept)     0.027     0.144     0.190  3178.456     0.850     0.214     0.177 
# groupB          0.207     0.198     1.044  5853.312     0.297     0.149     0.130 
# groupC         -0.333     0.208    -1.600  2213.214     0.110     0.268     0.212 
# Unadjusted hypothesis test as appropriate in larger samples.

Notice that we did not need to actually include x in the ANOVA. Rather, it was enough to include x in the imputation model, after which the analyses proceeded as usual.

We now have estimated the regression coefficients in the ANOVA model (i.e., the differences between group means), but we have yet to decide whether the means are all equal or not. To this end, we use a pooled version of the \(F\)-test above, which consists of a comparison of the full model (the ANOVA model) with a reduced model that does not contain the coefficients we wish to test.2

In this case, we wish to test the coefficients pertaining to the differences between groups, so the reduced model does not contain group as a predictor.

# fit the reduced ANOVA model (without 'group')
fit2.reduced <- with(implist, lm(y ~ 1))

The full and the reduced model can then be compared with the pooled version of the \(F\)-test (i.e., the Wald test), which is known in the literature as \(D_1\).

# compare the two models with pooled Wald test
testModels(fit2, fit2.reduced, method = "D1")
# Call:
# testModels(model = fit2, null.model = fit2.reduced, method = "D1")
# Model comparison calculated from 100 imputed data sets.
# Combination method: D1 
#     F.value      df1      df2    P(>F)      RIV 
#       3.635        2 7186.601    0.026    0.195 
# Unadjusted hypothesis test as appropriate in larger samples.

In contrast with listwise deletion, the \(F\)-test under MI indicates that the groups are significantly different from one another.

This is because MI makes use of all the observed data, including the covariate x, and used this information to generated replacements for missing y that took its relation with x into account. To see this, it is worth looking at a comparison of the observed and the imputed data.

The difference is not extreme, but it is easy to see that the imputed data tend to have more mass at the lower end of the distribution of y (especially in groups A and C).

This is again a result of how the data were simulated: Lower y values, through their relation with x, are missing more often, which is accounted for using MI. Conversely, using listwise deletion placed the group means more closely together than they should be, and this affected the results in the ANOVA.

Example 2: mixed/within-subjects ANOVA

In a within-subjects design, the design factor varies within (not between) persons, and we obtain multiple, repeated measurements for each condition (mixed designs include both). Fortunately, the procedure for the treatment and missing data and the analysis remains mostly the same.

Although within-subjects designs are analyzed most often with the repeated-measures ANOVA, mixed-effects models have become a popular alternative. Here, I will choose the latter because mixed-effects models make it straightforward to pool ANOVA-like hypotheses in within-subjects designs.

To fit the mixed-effects model, we will use the lmer() function from the package lme4.


I simulated a second data set similar to the one above, with \(n\) = 20 persons, a within-subject factor with three conditions, and five repeated measurements for each condition (CSV, Rdata).

id cond y x
1 1 A -1.699 -1.11
2 1 B -1.022 -1.11
3 1 C -0.287 -1.11
4 1 A -1.588 -1.11
5 1 B -2.028 -1.11

The variables mean the following:

  • id: the subject identifier
  • cond: the grouping variable (within subjects)
  • y: the repeated measurements for the outcome variable (with 18.7% missing data)
  • x: a subject-specific covariate

Like above, persons with lower values in x had a higher chance of missing data in y. Notice that cond varies within subjects, making the repeated measures for each condition “nested” within subjects.

Multiple imputation

To properly accommodate the “nested” structure of the repeated measurements, the imputation model can no longer be a simple regression. Instead, it needs to accommodate this structure by also employing a mixed-effects model. Specifying this model is easiest by first initializing the imputation model with the default values.

# run MI (for starting solution)
ini <- mice(data = dat, maxit = 0)

Then we define the subject identifier for the imputation model (id) and change the imputation method to a use a mixed-effects model ("2l.pan"). Running MI is then the same as before.

# define the 'subject' identifier (code as '-2' in predictor matrix)
pred <- ini$pred
pred["y", "id"] <- -2

# run MI
imp <- mice(data = dat, pred = pred, method = "2l.pan", m = 100)

The list of imputed data sets is generated as above.

# create a list of completed data sets
implist <- mids2mitml.list(imp)

The ANOVA model is then fit using lmer(). Notice that this model contains an additional term, (1|id), which specifies a random effect for each subject. This effect captures unsystematic differences between subjects, thus accounting for the nested structure of the repeated-measures data.

# fit the ANOVA model
fit3 <- with(implist, lmer(y ~ 1 + cond + (1|id)))
testEstimates(fit3, var.comp = TRUE)
# Call:
# testEstimates(model = fit3, var.comp = TRUE)
# Final parameter estimates and inferences obtained from 100 imputed data sets.
#               Estimate  Std.Error    t.value         df    P(>|t|)        RIV        FMI 
# (Intercept)     -0.065      0.227     -0.286 108243.160      0.775      0.031      0.030 
# condB            0.352      0.112      3.151   3197.566      0.002      0.214      0.176 
# condC            0.048      0.114      0.418   2120.953      0.676      0.276      0.217 
#                         Estimate 
# Intercept~~Intercept|id    0.893 
# Residual~~Residual         0.514 
# ICC|id                     0.635 
# Unadjusted hypothesis test as appropriate in larger samples.

The output is similar to before, with the regression coefficients denoting the differences between the conditions. In addition, the output includes the variance of the random effect that denotes the unsystematic differences between subjects.

Testing the null hypothesis of the ANOVA again requires the specification of a reduced model that does not contain the parameters to be tested (i.e., those pertaining to cond).

# pool the parameter estimates
fit3.reduced <- with(implist, lmer(y ~ 1 + (1|id)))
testModels(fit3, fit3.reduced, method = "D1")
# Call:
# testModels(model = fit3, null.model = fit3.reduced, method = "D1")
# Model comparison calculated from 100 imputed data sets.
# Combination method: D1 
#     F.value      df1      df2    P(>F)      RIV 
#       5.681        2 4847.199    0.003    0.248 
# Unadjusted hypothesis test as appropriate in larger samples.

As in the first example, the three conditions are significantly different from one another.

These two examples were obviously very simple. However, the same general procedure can be used for more complex ANOVA models, including models with two or more factors, interaction effects, or for mixed designs with both between- and within-subject factors.

Procedures other than MI

Imputation is not the only method that can deal with missing data, and other methods like maximum-likelihood estimation (ML) have also been recommended (Schafer & Graham, 2002). Using ML, cases contribute to the estimation of the model only to the extent to which they have data, and its results are often equally trustworthy as those under MI.

However, in the ANOVA, this should be taken with a grain of salt. For missing data in the outcome variable y, using ML simply means that the model is estimated using only the cases with observed y (i.e., listwise deletion), which can lead to distorted parameter estimates if other variables are related to the chance of observing y (see Example 1). In order to account for this, ML requires including these extra variables in the analysis model, which changes the meaning of the parameters (i.e., the ANOVA becomes ANCOVA, though the estimates for it would be unbiased!).

One key advantage of MI is that the treatment of missing data is independent of the analysis. Variables relevant for the treatment of missing data can be included in the imputation model without altering the analysis model.

Further reading

To read more about ANOVA models and the treatment of missing data therein, you can check the following resources:


  • Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum.
  • Grund, S., Lüdtke, O., & Robitzsch, A. (2016). Pooling ANOVA results from multiply imputed datasets: A simulation study. Methodology, 12, 75–88. doi:10.1027/1614-2241/a000111
  • Liu, Y., & Enders, C. K. (2017). Evaluation of multi-parameter test statistics for multiple imputation. Multivariate Behavioral Research, 52, 371–390. doi:10.1080/00273171.2017.1298432
  • Maxwell, S. E., Delaney, H. D., & Kelley, K. (2018). Designing experiments and analyzing data: A model comparison perspective (3rd ed.). Mahwah, NJ: Erlbaum.
  • Schafer, J. L., & Graham, J. W. (2002). Missing data: Our view of the state of the art. Psychological Methods, 7, 147–177. doi:10.1037//1082-989X.7.2.147
  • van Ginkel, J. R., & Kroonenberg, P. M. (2014). Analysis of variance of multiply imputed data. Multivariate Behavioral Research, 49, 78–91. doi:10.1080/00273171.2013.855890

  1. The hypothesis test in ANOVA is a Wald test that simultaneously tests all the differences between groups against zero. In this example, these differences are represented by the regression coefficients for groupB and groupC.

    This can easily be verified by calculating the Wald test by hand:

    # estimates and covariance matrix
    b <- coef(fit1)[-1]
    V <- vcov(fit1)[-1,-1]
    # Wald-test
    F <- b %*% solve(V) %*% b / 2      # F statistic
    pf(F, 2, 116, lower.tail = FALSE)  # p value
    #      [,1]
    # [1,] 0.26

    The resulting \(F\) and \(p\) value are exactly the same as in the output above.

  2. Technically, a reduced model is not necessary (only convenient). The Wald test can be formulated equivalently with a linear constraint on the parameters of the full model (i.e., setting them to zero).

    Under MI, this can be done, too, with the testConstraints() function:

    # define and test parameter constraints
    con <- c("groupB", "groupC")
    testConstraints(fit2, constraints = con, method = "D1")
    # Call:
    # testConstraints(model = fit2, constraints = con, method = "D1")
    # Hypothesis test calculated from 100 imputed data sets. The following
    # constraints were specified:
    #              Estimate Std. Error 
    #    groupB:      0.207      0.202 
    #    groupC:     -0.333      0.202 
    # Combination method: D1 
    #     F.value      df1      df2    P(>F)      RIV 
    #       3.635        2 7186.601    0.026    0.195 
    # Unadjusted hypothesis test as appropriate in larger samples.

    The results of this are identical to those of testModels().

To leave a comment for the author, please follow the link and comment on their blog: Simon Grund [R]. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)