# Linear Mixed Effects Models in Agriculture

**R tutorial for Spatial Statistics**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This post was originally part of my previous post about linear models. However, I later decided to split it into several texts because it was effectively too long and complex to navigate.

If you struggle to follow the code in this page please refer to this post (for example for the necessary packages): Linear Models (lm, ANOVA and ANCOVA) in Agriculture

## Linear Mixed-Effects Models

*u*and the standard error term. A more complex form, that is normally used for repeated measures is the random slope and intercept model:

*v*related to time

*T*.

### Random Intercept Model for Clustered Data

Just to explain the syntax to use linear mixed-effects model in R for cluster data, we will assume that the factorial variable rep in our dataset describe some clusters in the data. To fit a mixed-effects model we are going to use the function lme from the package nlme. This function can work with unbalanced designs:

` lme1 = lme(yield ~ nf + bv * topo, random= ~1|rep, data=dat) `

` > summary(lme1) `

Linear mixed-effects model fit by REML

Data: dat

AIC BIC logLik

27648.36 27740.46 -13809.18

Random effects:

Formula: ~1 | rep

(Intercept) Residual

StdDev: 0.798407 13.3573

Fixed effects: yield ~ nf + bv * topo

Value Std.Error DF t-value p-value

(Intercept) 327.3304 14.782524 3428 22.143068 0

nfN1 3.9643 0.788049 3428 5.030561 0

nfN2 5.2340 0.790104 3428 6.624449 0

nfN3 5.4498 0.789084 3428 6.906496 0

nfN4 7.5286 0.789551 3428 9.535320 0

nfN5 7.7254 0.789111 3428 9.789976 0

bv -1.4685 0.085507 3428 -17.173569 0

topoHT -233.3675 17.143956 3428 -13.612232 0

topoLO -251.9750 20.967003 3428 -12.017693 0

topoW -146.4066 16.968453 3428 -8.628162 0

bv:topoHT 1.1945 0.097696 3428 12.226279 0

bv:topoLO 1.4961 0.123424 3428 12.121624 0

bv:topoW 0.7873 0.097865 3428 8.044485 0

` > Anova(lme2, type=c("III")) `

Analysis of Deviance Table (Type III tests)

Response: yield

Chisq Df Pr(>Chisq)

(Intercept) 752.25 1 < 2.2e-16 ***

nf 155.57 5 < 2.2e-16 ***

bv 291.49 1 < 2.2e-16 ***

topo 236.52 3 < 2.2e-16 ***

year 797.13 1 < 2.2e-16 ***

bv:topo 210.38 3 < 2.2e-16 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

` > anova(lme1, mod6) `

Model df AIC BIC logLik Test L.Ratio p-value

lme1 1 15 27648.36 27740.46 -13809.18

mod6 2 14 27651.21 27737.18 -13811.61 1 vs 2 4.857329 0.0275

Another way to assess the accuracy of GLS and Mixed Effects Models is through the use of pseudo R-Squared, which are indexes that can be interpreted as the normal R-Squared but calculated in different ways, since in these more complex models we do not calculate the sum of squares.

There are two important functions for this, both included in the package MuMIn:

` library(MuMIn) `

> r.squaredLR(mod6)

[1] 0.5469906

attr(,"adj.r.squared")

[1] 0.5470721

> r.squaredGLMM(lme1)

R2m R2c

0.5459845 0.5476009

In this case we can see again that the R2 are similar between models, and most importantly R2c is only slightly different compared to R2m, which means that including random effects does not improve the accuracy.

### Random Intercept and Slope for repeated measures

` lme2 = lme(yield ~ nf + bv * topo + year, random= ~year|rep, data=dat) `

summary(lme2)

Anova(lme2, type=c("III"))

` > anova(lme1, lme2) `

Model df AIC BIC logLik Test L.Ratio p-value

lme1 1 15 27648.36 27740.46 -13809.18

lme2 2 18 26938.83 27049.35 -13451.42 1 vs 2 715.5247 <.0001

Warning message:

In anova.lme(lme1, lme2) :

fitted objects with different fixed effects. REML comparisons are not meaningful.

` > ranef(lme2) `

(Intercept) year

R1 -0.3468601 -1.189799e-07

R2 -0.5681688 -1.973702e-07

R3 0.9150289 3.163501e-07

` > fixef(lme2) `

(Intercept) nfN1 nfN2 nfN3 nfN4

-1.133614e+04 3.918006e+00 5.132136e+00 5.368513e+00 7.464542e+00

nfN5 bv topoHT topoLO topoW

7.639337e+00 -1.318391e+00 -2.049979e+02 -2.321431e+02 -1.136168e+02

year bv:topoHT bv:topoLO bv:topoW

5.818826e+00 1.027686e+00 1.383705e+00 5.998379e-01

` > intervals(lme2, which = "fixed") `

Approximate 95% confidence intervals

Fixed effects:

lower est. upper

(Intercept) -1.214651e+04 -1.133614e+04 -1.052576e+04

nfN1 2.526139e+00 3.918006e+00 5.309873e+00

nfN2 3.736625e+00 5.132136e+00 6.527648e+00

nfN3 3.974809e+00 5.368513e+00 6.762216e+00

nfN4 6.070018e+00 7.464542e+00 8.859065e+00

nfN5 6.245584e+00 7.639337e+00 9.033089e+00

bv -1.469793e+00 -1.318391e+00 -1.166989e+00

topoHT -2.353450e+02 -2.049979e+02 -1.746508e+02

topoLO -2.692026e+02 -2.321431e+02 -1.950836e+02

topoW -1.436741e+02 -1.136168e+02 -8.355954e+01

year 5.414742e+00 5.818826e+00 6.222911e+00

bv:topoHT 8.547273e-01 1.027686e+00 1.200644e+00

bv:topoLO 1.165563e+00 1.383705e+00 1.601846e+00

bv:topoW 4.264933e-01 5.998379e-01 7.731826e-01

attr(,"label")

[1] "Fixed effects:"

## Syntax with lme4

` > lmer1 = lmer(yield ~ nf + bv * topo + (1|rep), data=dat) `

>

> summary(lmer1)

Linear mixed model fit by REML ['lmerMod']

Formula: yield ~ nf + bv * topo + (1 | rep)

Data: dat

REML criterion at convergence: 27618.4

Scaled residuals:

Min 1Q Median 3Q Max

-3.4267 -0.7767 -0.1109 0.7196 3.6892

Random effects:

Groups Name Variance Std.Dev.

rep (Intercept) 0.6375 0.7984

Residual 178.4174 13.3573

Number of obs: 3443, groups: rep, 3

Fixed effects:

Estimate Std. Error t value

(Intercept) 327.33043 14.78252 22.143

nfN1 3.96433 0.78805 5.031

nfN2 5.23400 0.79010 6.624

nfN3 5.44980 0.78908 6.906

nfN4 7.52862 0.78955 9.535

nfN5 7.72537 0.78911 9.790

bv -1.46846 0.08551 -17.174

topoHT -233.36750 17.14396 -13.612

topoLO -251.97500 20.96700 -12.018

topoW -146.40655 16.96845 -8.628

bv:topoHT 1.19446 0.09770 12.226

bv:topoLO 1.49609 0.12342 12.122

bv:topoW 0.78727 0.09786 8.044

Correlation matrix not shown by default, as p = 13 > 12.

Use print(x, correlation=TRUE) or

vcov(x) if you need it

There are several differences between nlme and lme4 and I am not sure which is actually better. What I found is that probably lme4 is the most popular, but nlme is used for example to fit generalized addictive mixed effects models in the package mgcv. For this reason probably the best thing would be to know how to use both packages.

As you can see from the summary above, in this table there are no p-values, so it is a bit difficult to know which levels are significant for the model. We can solve this by installing and/or loading the package lmerTest. If we load lmerTest and run again the same function we would obtain the following summary table:

` > lmer1 = lmer(yield ~ nf + bv * topo + (1|rep), data=dat) `

>

> summary(lmer1)

Linear mixed model fit by REML t-tests use Satterthwaite approximations to degrees of freedom [

lmerMod]

Formula: yield ~ nf + bv * topo + (1 | rep)

Data: dat

REML criterion at convergence: 27618.4

Scaled residuals:

Min 1Q Median 3Q Max

-3.4267 -0.7767 -0.1109 0.7196 3.6892

Random effects:

Groups Name Variance Std.Dev.

rep (Intercept) 0.6375 0.7984

Residual 178.4174 13.3573

Number of obs: 3443, groups: rep, 3

Fixed effects:

Estimate Std. Error df t value Pr(>|t|)

(Intercept) 327.33043 14.78252 3411.00000 22.143 < 2e-16 ***

nfN1 3.96433 0.78805 3428.00000 5.031 5.14e-07 ***

nfN2 5.23400 0.79010 3428.00000 6.624 4.03e-11 ***

nfN3 5.44980 0.78908 3428.00000 6.906 5.90e-12 ***

nfN4 7.52862 0.78955 3428.00000 9.535 < 2e-16 ***

nfN5 7.72537 0.78911 3428.00000 9.790 < 2e-16 ***

bv -1.46846 0.08551 3428.00000 -17.174 < 2e-16 ***

topoHT -233.36750 17.14396 3429.00000 -13.612 < 2e-16 ***

topoLO -251.97500 20.96700 3430.00000 -12.018 < 2e-16 ***

topoW -146.40655 16.96845 3430.00000 -8.628 < 2e-16 ***

bv:topoHT 1.19446 0.09770 3429.00000 12.226 < 2e-16 ***

bv:topoLO 1.49609 0.12342 3430.00000 12.122 < 2e-16 ***

bv:topoW 0.78727 0.09786 3430.00000 8.044 1.33e-15 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation matrix not shown by default, as p = 13 > 12.

Use print(x, correlation=TRUE) or

vcov(x) if you need it

As you can see now the p-values are showing and we can assess the significance for each term.

Clearly, all the functions we used above for the function lme can be used also with the package lme4 and lmerTest. For example, we can produce the anova table with the function anova or compute the R Squared with the function r.squaredGLMM.

When we are dealing with random slope and intercept we would use the following syntax:

` lmer2 = lmer(yield ~ nf + bv * topo + year + (year|rep), data=dat) `

**leave a comment**for the author, please follow the link and comment on their blog:

**R tutorial for Spatial Statistics**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.