Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

How to get parameter-specific p-values is one of the most commonly asked questions about multilevel regression. The key issue is that the degrees of freedom are not trivial to compute for multilevel regression. Various detailed discussions can be found on the R-wiki and R-help mailing list post by Doug Bates. I have experimented with three methods that I think are reasonable.

1. Use the normal approximation. Since the t distribution converges to the z distribution as degrees of freedom increase, this is like assuming infinite degrees of freedom. This is unambiguously anti-conservative, but for reasonable sample sizes, it appears not to be very anti-conservative (Barr et al., 2013). That is, if we take the p-value to measure the probability of a false positive, this approximation produces a somewhat (but perhaps not alarmingly) higher false positive rate than the nominal 5% at = 0.05.

Here is an example using proportions of semantic errors in picture naming by different aphasia subtypes (the data file can be found here):

load("Examples.RData")
require(lme4)
# fit the model
m.sem <- lmer(Semantic.error ~ TestTime * Diagnosis + (TestTime | SubjectID),
data = NamingRecovery, REML = FALSE)
# extract coefficients
coefs <- data.frame(coef(summary(m.sem)))
# use normal distribution to approximate p-value
coefs$p.z <- 2 * (1 - pnorm(abs(coefs$t.value)))
coefs

##                               Estimate Std..Error t.value       p.z
## (Intercept)                   0.045767   0.007776  5.8855 3.970e-09
## TestTime                     -0.008685   0.003524 -2.4644 1.372e-02
## DiagnosisConduction          -0.015149   0.010039 -1.5090 1.313e-01
## DiagnosisWernicke            -0.004899   0.010287 -0.4762 6.339e-01
## TestTime:DiagnosisConduction  0.007308   0.004550  1.6063 1.082e-01
## TestTime:DiagnosisWernicke    0.012854   0.004662  2.7571 5.832e-03


2. Use the Satterthwaite approximation, which is implemented in the lmerTest package. According to the documentation, this is based on SAS proc mixed theory. The lmerTest package overloads the lmer function, so you can just re-fit the model using exactly the same code, but the summary() will now include approximate degrees of freedom and p-values. This implementation is extremely easy to use, but can be a little maddening if you forget whether your model is a an object of type lmerMod or merModLmerTest.

require(lmerTest)

# re-fit model
m.semTest <- lmer(Semantic.error ~ TestTime * Diagnosis + (TestTime | SubjectID),
data = NamingRecovery, REML = FALSE)
# get Satterthwaite-approximated degrees of freedom
coefs$df.Satt <- coef(summary(m.semTest))[, 3] # get approximate p-values coefs$p.Satt <- coef(summary(m.semTest))[, 5]
coefs

##                               Estimate Std..Error t.value       p.z
## (Intercept)                   0.045767   0.007776  5.8855 3.970e-09
## TestTime                     -0.008685   0.003524 -2.4644 1.372e-02
## DiagnosisConduction          -0.015149   0.010039 -1.5090 1.313e-01
## DiagnosisWernicke            -0.004899   0.010287 -0.4762 6.339e-01
## TestTime:DiagnosisConduction  0.007308   0.004550  1.6063 1.082e-01
## TestTime:DiagnosisWernicke    0.012854   0.004662  2.7571 5.832e-03
##                              df.Satt    p.Satt
## (Intercept)                    23.00 5.344e-06
## TestTime                       22.99 2.163e-02
## DiagnosisConduction            23.00 1.449e-01
## DiagnosisWernicke              23.00 6.384e-01
## TestTime:DiagnosisConduction   22.99 1.219e-01
## TestTime:DiagnosisWernicke     22.99 1.122e-02


3. Use the Kenward-Roger approximation to get approximate degrees of freedom and the t-distribution to get p-values, which is implemented in the pbkrtest package.

require(pbkrtest)

# get the KR-approximated degrees of freedom
df.KR <- get_ddf_Lb(m.sem, fixef(m.sem))

# get p-values from the t-distribution using the t-values and approximated
# degrees of freedom
coefs$p.KR <- 2 * (1 - pt(abs(coefs$t.value), df.KR))
coefs

##                               Estimate Std..Error t.value       p.z
## (Intercept)                   0.045767   0.007776  5.8855 3.970e-09
## TestTime                     -0.008685   0.003524 -2.4644 1.372e-02
## DiagnosisConduction          -0.015149   0.010039 -1.5090 1.313e-01
## DiagnosisWernicke            -0.004899   0.010287 -0.4762 6.339e-01
## TestTime:DiagnosisConduction  0.007308   0.004550  1.6063 1.082e-01
## TestTime:DiagnosisWernicke    0.012854   0.004662  2.7571 5.832e-03
##                              df.Satt    p.Satt      p.KR
## (Intercept)                    23.00 5.344e-06 9.049e-06
## TestTime                       22.99 2.163e-02 2.283e-02
## DiagnosisConduction            23.00 1.449e-01 1.468e-01
## DiagnosisWernicke              23.00 6.384e-01 6.390e-01
## TestTime:DiagnosisConduction   22.99 1.219e-01 1.238e-01
## TestTime:DiagnosisWernicke     22.99 1.122e-02 1.210e-02


Not surprisingly, the Satterthwaite and Kenward-Roger approximations produce slightly more conservative p-values than the normal approximation does. However, even in this not-very-large data set (24 participants, 6-9 in each of 3 groups, 5 observations for each participant) the normal approximation is only slightly less conservative than these two options. The approximate degrees of freedom are slightly higher when using the Satterthwaite approximation (23) than the Kenward-Roger approximation (20.1), which makes the latter a bit more conservative. Of course, these are not systematic tests, but it is fairly representative of a few models that I've compared.