The analysis of variance, or ANOVA, is among the most popular methods for analyzing how an outcome variable differs between groups, for example, in observational studies or in experiments with different conditions.
But how do we conduct the ANOVA when there are missing data?
In this post, I show how to deal with missing data in between and withinsubject designs using multiple imputation (MI) in R.
The ANOVA model
In the onefactorial ANOVA, the goal is to investigate whether two or more groups differ with respect to some outcome variable \(y\).
The statistical model can be written as
\[
\begin{equation} \label{model}
y_{ij} = \mu_j + e_{ij} \; ,
\end{equation}
\]
where \(y_{ij}\) denotes the value of \(y\) for person \(i\) in group \(j\), and \(\mu_j\) is the mean in group \(j\).
The (omnibus) null hypothesis of the ANOVA states that all groups have identical population means.
For three groups, this would mean that
\[
\begin{equation}
\mu_1 = \mu_2 = \mu_3 \; .
\end{equation}
\]
This hypothesis is tested by looking at whether the differences between groups are larger than what could be expected from the differences within groups.
If this is the case, then we reject the null, and the group means are said to be “significantly” different from one another.
In the following, we will look at how this hypothesis can be tested when the outcome variable contains missing data.
Let’s illustrate this with an example.
Example 1: betweensubjects ANOVA
For this example, I simulated some data according to a betweensubject design with three groups, \(n\) = 50 subjects per group, and a “medium” effect size of \(f\) = .25, which roughly corresponds to an \(R^2=6.8\%\) (Cohen, 1988).
You can download the data from this post if you want to reproduce the results (CSV, Rdata). Here are the first few rows.
group  y  x  

1  A  0.715  0.062 
2  B  0.120  0.633 
3  C  1.341  1.176 
4  A  NA  0.792 
5  B  0.468  0.243 
The three variables mean the following:
group
: the grouping variabley
: the outcome variable (with 20.7% missing data)x
: an additional covariate
In this example, cases with lower values in x
had a higher chance of missing data in y
.
Because x
is also positively correlated with y
, this means that smaller y
values are missing more often than larger ones.
Listwise deletion
Lets see what happens if we run the ANOVA only with those cases that have y
observed (i.e., listwise deletion). This is the standard setting on most statistical software.
In R, the ANOVA can be conducted with the lm()
function as follows.
# fit the ANOVA model
fit1 < lm(y ~ 1 + group, data = dat)
summary(fit1)
#
# Call:
# lm(formula = y ~ 1 + group, data = dat)
#
# Residuals:
# Min 1Q Median 3Q Max
# 1.8196 0.6237 0.0064 0.5657 2.1808
#
# Coefficients:
# Estimate Std. Error t value Pr(>t)
# (Intercept) 0.0944 0.1452 0.65 0.52
# groupB 0.1720 0.1972 0.87 0.38
# groupC 0.1570 0.2082 0.75 0.45
#
# Residual standard error: 0.895 on 116 degrees of freedom
# (31 observations deleted due to missingness)
# Multiple Rsquared: 0.0229, Adjusted Rsquared: 0.00609
# Fstatistic: 1.36 on 2 and 116 DF, pvalue: 0.26
In this example, the \(F\)test at the bottom of the output indicates that the group means are not significantly different from one another, \(F(\!\) 2, 116 \(\!)\) = 1.361 (\(p\) = 0.26).^{1}
In addition, the effect size (\(R^2\) = 0.023) is quite a bit smaller than what was used to generate the data.
In fact, this result is a direct consequence of how the missing data were simulated.
Fortunately, there are statistical methods that can account for the missing data and help us obtain more trustworthy results.
Multiple imputation
One of the most effective ways of dealing with missing data is multiple imputation (MI).
Using MI, we can create multiple plausible replacements of the missing data, given what we have observed and a statistical model (the imputation model).
In the ANOVA, using MI has the additional benefit that it allows taking covariates into account that are relevant for the missing data but not for the analysis.
In this example, x
is a direct cause of missing data in y
. Therefore, we must take x
into account when making inferences about y
in the ANOVA.
Running MI consists of three steps. First, the missing data are imputed multiple times. Second, the imputed data sets are analyzed separately. Third, the parameter estimates and hypothesis tests are pooled to form a final set of estimates and inferences.
For this example, we will use the mice
and mitml
packages to conduct MI.
library(mice)
library(mitml)
Specifying an imputation model is very simple here. With the following command, we generate 100 imputations for y
on the basis of a regression model with both group
and x
as predictors and a normal error term.
# run MI
imp < mice(data = dat, method = "norm", m = 100)
The imputed data sets can then be saved as a list, containing 100 copies of the original data, in which the missing data have been replaced by different imputations.
# create a list of completed data sets
implist < mids2mitml.list(imp)
Finally, we fit the ANOVA model to each of the imputed data sets and pool the results.
The analysis part is done with the with()
command, which applies the same linear model, lm()
, to each data set.
The pooling es then done with the testEstimates()
function.
# fit the ANOVA model
fit2 < with(implist, lm(y ~ 1 + group))
# pool the parameter estimates
testEstimates(fit2)
#
# Call:
#
# testEstimates(model = fit2)
#
# Final parameter estimates and inferences obtained from 100 imputed data sets.
#
# Estimate Std.Error t.value df P(>t) RIV FMI
# (Intercept) 0.027 0.144 0.190 3178.456 0.850 0.214 0.177
# groupB 0.207 0.198 1.044 5853.312 0.297 0.149 0.130
# groupC 0.333 0.208 1.600 2213.214 0.110 0.268 0.212
#
# Unadjusted hypothesis test as appropriate in larger samples.
Notice that we did not need to actually include x
in the ANOVA.
Rather, it was enough to include x
in the imputation model, after which the analyses proceeded as usual.
We now have estimated the regression coefficients in the ANOVA model (i.e., the differences between group means), but we have yet to decide whether the means are all equal or not.
To this end, we use a pooled version of the \(F\)test above, which consists of a comparison of the full model (the ANOVA model) with a reduced model that does not contain the coefficients we wish to test.^{2}
In this case, we wish to test the coefficients pertaining to the differences between groups, so the reduced model does not contain group
as a predictor.
# fit the reduced ANOVA model (without 'group')
fit2.reduced < with(implist, lm(y ~ 1))
The full and the reduced model can then be compared with the pooled version of the \(F\)test (i.e., the Wald test), which is known in the literature as \(D_1\).
# compare the two models with pooled Wald test
testModels(fit2, fit2.reduced, method = "D1")
#
# Call:
#
# testModels(model = fit2, null.model = fit2.reduced, method = "D1")
#
# Model comparison calculated from 100 imputed data sets.
# Combination method: D1
#
# F.value df1 df2 P(>F) RIV
# 3.635 2 7186.601 0.026 0.195
#
# Unadjusted hypothesis test as appropriate in larger samples.
In contrast with listwise deletion, the \(F\)test under MI indicates that the groups are significantly different from one another.
This is because MI makes use of all the observed data, including the covariate x
, and used this information to generated replacements for missing y
that took its relation with x
into account.
To see this, it is worth looking at a comparison of the observed and the imputed data.
The difference is not extreme, but it is easy to see that the imputed data tend to have more mass at the lower end of the distribution of y
(especially in groups A and C).
This is again a result of how the data were simulated: Lower y
values, through their relation with x
, are missing more often, which is accounted for using MI.
Conversely, using listwise deletion placed the group means more closely together than they should be, and this affected the results in the ANOVA.
Example 2: mixed/withinsubjects ANOVA
In a withinsubjects design, the design factor varies within (not between) persons, and we obtain multiple, repeated measurements for each condition (mixed designs include both).
Fortunately, the procedure for the treatment and missing data and the analysis remains mostly the same.
Although withinsubjects designs are analyzed most often with the repeatedmeasures ANOVA, mixedeffects models have become a popular alternative. Here, I will choose the latter because mixedeffects models make it straightforward to pool ANOVAlike hypotheses in withinsubjects designs.
To fit the mixedeffects model, we will use the lmer()
function from the package lme4
.
library(lme4)
I simulated a second data set similar to the one above, with \(n\) = 20 persons, a withinsubject factor with three conditions, and five repeated measurements for each condition (CSV, Rdata).
id  cond  y  x  

1  1  A  1.699  1.11 
2  1  B  1.022  1.11 
3  1  C  0.287  1.11 
4  1  A  1.588  1.11 
5  1  B  2.028  1.11 
The variables mean the following:
id
: the subject identifiercond
: the grouping variable (within subjects)y
: the repeated measurements for the outcome variable (with 18.7% missing data)x
: a subjectspecific covariate
Like above, persons with lower values in x
had a higher chance of missing data in y
. Notice that cond
varies within subjects, making the repeated measures for each condition “nested” within subjects.
Multiple imputation
To properly accommodate the “nested” structure of the repeated measurements, the imputation model can no longer be a simple regression.
Instead, it needs to accommodate this structure by also employing a mixedeffects model.
Specifying this model is easiest by first initializing the imputation model with the default values.
# run MI (for starting solution)
ini < mice(data = dat, maxit = 0)
Then we define the subject identifier for the imputation model (id
) and change the imputation method to a use a mixedeffects model ("2l.pan"
).
Running MI is then the same as before.
# define the 'subject' identifier (code as '2' in predictor matrix)
pred < ini$pred
pred["y", "id"] < 2
# run MI
imp < mice(data = dat, pred = pred, method = "2l.pan", m = 100)
summary(imp)
The list of imputed data sets is generated as above.
# create a list of completed data sets
implist < mids2mitml.list(imp)
The ANOVA model is then fit using lmer()
.
Notice that this model contains an additional term, (1id)
, which specifies a random effect for each subject.
This effect captures unsystematic differences between subjects, thus accounting for the nested structure of the repeatedmeasures data.
# fit the ANOVA model
fit3 < with(implist, lmer(y ~ 1 + cond + (1id)))
testEstimates(fit3, var.comp = TRUE)
#
# Call:
#
# testEstimates(model = fit3, var.comp = TRUE)
#
# Final parameter estimates and inferences obtained from 100 imputed data sets.
#
# Estimate Std.Error t.value df P(>t) RIV FMI
# (Intercept) 0.065 0.227 0.286 108243.160 0.775 0.031 0.030
# condB 0.352 0.112 3.151 3197.566 0.002 0.214 0.176
# condC 0.048 0.114 0.418 2120.953 0.676 0.276 0.217
#
# Estimate
# Intercept~~Interceptid 0.893
# Residual~~Residual 0.514
# ICCid 0.635
#
# Unadjusted hypothesis test as appropriate in larger samples.
The output is similar to before, with the regression coefficients denoting the differences between the conditions. In addition, the output includes the variance of the random effect that denotes the unsystematic differences between subjects.
Testing the null hypothesis of the ANOVA again requires the specification of a reduced model that does not contain the parameters to be tested (i.e., those pertaining to cond
).
# pool the parameter estimates
fit3.reduced < with(implist, lmer(y ~ 1 + (1id)))
testModels(fit3, fit3.reduced, method = "D1")
#
# Call:
#
# testModels(model = fit3, null.model = fit3.reduced, method = "D1")
#
# Model comparison calculated from 100 imputed data sets.
# Combination method: D1
#
# F.value df1 df2 P(>F) RIV
# 5.681 2 4847.199 0.003 0.248
#
# Unadjusted hypothesis test as appropriate in larger samples.
As in the first example, the three conditions are significantly different from one another.
These two examples were obviously very simple.
However, the same general procedure can be used for more complex ANOVA models, including models with two or more factors, interaction effects, or for mixed designs with both between and withinsubject factors.
Procedures other than MI
Imputation is not the only method that can deal with missing data, and other methods like maximumlikelihood estimation (ML) have also been recommended (Schafer & Graham, 2002).
Using ML, cases contribute to the estimation of the model only to the extent to which they have data, and its results are often equally trustworthy as those under MI.
However, in the ANOVA, this should be taken with a grain of salt.
For missing data in the outcome variable y
, using ML simply means that the model is estimated using only the cases with observed y
(i.e., listwise deletion), which can lead to distorted parameter estimates if other variables are related to the chance of observing y
(see Example 1).
In order to account for this, ML requires including these extra variables in the analysis model, which changes the meaning of the parameters (i.e., the ANOVA becomes ANCOVA, though the estimates for it would be unbiased!).
One key advantage of MI is that the treatment of missing data is independent of the analysis.
Variables relevant for the treatment of missing data can be included in the imputation model without altering the analysis model.
Further reading
To read more about ANOVA models and the treatment of missing data therein, you can check the following resources:
 Maxwell, Delaney, and Kelley (2018) give a great introduction into the design and analysis of experimental data with the ANOVA and mixedeffects models
 van Ginkel and Kroonenberg (2014) provide a detailed discussion of missing data and MI in the ANOVA with examples, syntax files, and a macro for SPSS
 Grund, Lüdtke, and Robitzsch (2016) provide a comparison of different methods for testing hypotheses in the ANOVA under MI
 Liu and Enders (2017) provide a similar comparison in the context of regression analyses
References
 Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum.
 Grund, S., Lüdtke, O., & Robitzsch, A. (2016). Pooling ANOVA results from multiply imputed datasets: A simulation study. Methodology, 12, 75–88. doi:10.1027/16142241/a000111
 Liu, Y., & Enders, C. K. (2017). Evaluation of multiparameter test statistics for multiple imputation. Multivariate Behavioral Research, 52, 371–390. doi:10.1080/00273171.2017.1298432
 Maxwell, S. E., Delaney, H. D., & Kelley, K. (2018). Designing experiments and analyzing data: A model comparison perspective (3rd ed.). Mahwah, NJ: Erlbaum.
 Schafer, J. L., & Graham, J. W. (2002). Missing data: Our view of the state of the art. Psychological Methods, 7, 147–177. doi:10.1037//1082989X.7.2.147
 van Ginkel, J. R., & Kroonenberg, P. M. (2014). Analysis of variance of multiply imputed data. Multivariate Behavioral Research, 49, 78–91. doi:10.1080/00273171.2013.855890

The hypothesis test in ANOVA is a Wald test that simultaneously tests all the differences between groups against zero. In this example, these differences are represented by the regression coefficients for
groupB
andgroupC
.This can easily be verified by calculating the Wald test by hand:
# estimates and covariance matrix b < coef(fit1)[1] V < vcov(fit1)[1,1] # Waldtest F < b %*% solve(V) %*% b / 2 # F statistic pf(F, 2, 116, lower.tail = FALSE) # p value
# [,1] # [1,] 0.26
The resulting \(F\) and \(p\) value are exactly the same as in the output above.↩

Technically, a reduced model is not necessary (only convenient). The Wald test can be formulated equivalently with a linear constraint on the parameters of the full model (i.e., setting them to zero).
Under MI, this can be done, too, with the
testConstraints()
function:# define and test parameter constraints con < c("groupB", "groupC") testConstraints(fit2, constraints = con, method = "D1")
# # Call: # # testConstraints(model = fit2, constraints = con, method = "D1") # # Hypothesis test calculated from 100 imputed data sets. The following # constraints were specified: # # Estimate Std. Error # groupB: 0.207 0.202 # groupC: 0.333 0.202 # # Combination method: D1 # # F.value df1 df2 P(>F) RIV # 3.635 2 7186.601 0.026 0.195 # # Unadjusted hypothesis test as appropriate in larger samples.
The results of this are identical to those of
testModels()
.↩
Rbloggers.com offers daily email updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...