**Anything but R-bitrary**, and kindly contributed to R-bloggers)

**What’s the gain over lm()?***By Ben Ogorek*

Random effects models have always intrigued me. They offer the flexibility of many parameters under a single unified, cohesive and parsimonious system. But with the growing size of data sets and increased ability to estimate many parameters with a high level of accuracy, will the subtleties of the random effects analysis be lost?

In this article, we will look an example that could be analyzed with either a traditional regression approach, using lm(), or a more sophisticated approach using random effects via the lme4 package by Douglas Bates, Martin Maechler and Ben Bolker (2011). And then I’ll pose the question, “*is it worth it?”*

## Random effects for the uninitiated

For the uninitiated in random effects models, suppose we have the linear model

y_{j} = βx_{j}+ ε_{j}

for j = 1,…,J, where ε_{j} is iid gaussian noise. But also suppose that this pattern repeats itself for some set of units i = 1,…,n. If the units are different, then maybe their slope and intercepts are different, and we’d have to rewrite the equation as

y_{ij} = α_{i} + β_{i}x_{ij} + ε_{ij}

where again the ε_{ij}s are iid gaussian variables.

But if every unit has its own α and β, and each unit is part of some stable population, shouldn’t there be a common thread underlying the parameters? And if you know something about α_{1} and α_{2}, maybe you should also know something about α_{3} – not because of dependence – but because α_{1} and α_{2} give you an idea of what is typical of αs among units. Expand this and you’ll get an idea of what is atypical as well.

Placing a joint probability distribution over these parameters is a more formal way to accomplish this. If you have a probability distribution (for better or for worse), then you’re “random.” *Welcome to the world of random coefficients.*

## Setting up

Let’s experiment. First, start with a clean R session. Below we set up the shell of our future data set. The code below produces 30 primary units and 15 measurements within each primary unit – a nice rectangular structure.

Next, we’ll generate data from our above model, where β_{i} ~ N(**3**, **.2**^{2}), α_{i} = 1 for all i, and ε_{ij} ~ N(0, **.75**^{2}).

The last line of code allows us to view the first 18 records.

`> head(test.df, 18)`

unit J x beta y

1 1 1 -0.11300084 3.306203 0.26360591

2 1 2 1.28001884 3.306203 4.37991072

3 1 3 -0.99839530 3.306203 -2.41951563

4 1 4 0.92637466 3.306203 3.66430487

5 1 5 0.81537679 3.306203 3.10488869

6 1 6 1.60623340 3.306203 6.89280816

7 1 7 -0.12976856 3.306203 1.20187488

8 1 8 0.18872381 3.306203 2.23705005

9 1 9 0.22488639 3.306203 1.80095766

10 1 10 -0.23402213 3.306203 0.68786438

11 1 11 0.68923084 3.306203 3.10347646

12 1 12 0.05152819 3.306203 2.14595637

13 1 13 0.86556045 3.306203 4.74330873

14 1 14 -0.46953578 3.306203 -0.83396291

15 1 15 -0.45621596 3.306203 -0.33406847

16 2 1 -1.17441143 2.936359 -3.48678813

17 2 2 -0.40591147 2.936359 -0.03560251

18 2 3 0.69477215 2.936359 2.59961931

Notice how β_{i} changes only as the unit id changes, whereas y and x vary at the row level.

## Separate regressions

Suppose we run seperate regressions for each unit using a loop of lm() calls:

The cost of having to estimate the β_{i}s can be seen by the histograms below, whereas the estimated β_{i}s exhibit noticeably more variability.

par(mfrow = c(2, 1))

hist(beta, main = "Histogram of Actual Slopes", col = "blue",

xlab = expression(beta[i]), cex.axis = 1.5, cex.lab = 1.5,

breaks = seq(from = 2.4, to = 3.6, by = .1) )

hist(as.numeric(beta.hat), main = "Histogram of Estimated Slopes",

xlab = expression(hat(beta)[i]), col = "blue", cex.axis = 1.5,

cex.lab = 1.5, breaks = seq(from = 2.4, to = 3.6, by = .1) )

## Random effects modeling using lme4

if you haven’t already, install the lme4 package using the command

install.packages(“lme4″).

The flagship function of the lme4 package is the lmer() function, a likelihood based system for estimating random effects models. Its formula notation works like lm()’s for fixed effects, but if you try to run a basic lm() model in it, you’ll get an error message – lmer() needs random effects! When you have fixed effects, you do enter them as in lm(). For random effects, the form is

(formula for **random terms** | **unit** for which these terms apply).

Starting on the left side of the bar, the formula for a random intercept, by itself, is simply “1”. The formula for a random regression coeficient for a variable x, *without* the corresponding random intercept, is “0 + x”. Random intercepts are included by default, so “x” and “1 + x” are equivalent specifications of both a random slope and a random intercept.

Random effects must vary at a courser grain than at the finest level, or else they’d be confounded (with ε_{ij} in our case). The second part of the random formula specification requires a variable that identifies this courser grain, which in our case is the unit identifier.

Below we apply lmer() to our generated data. For demonstration purposes, both a random intercept and slope are specified, even though we know the intercept is fixed. And as a general point about lmer(), we’ll need to include the mean of our random β_{i}s as fixed effect. This is accomplished by adding x as a fixed regressor alongside its specification as a random effect (think random deviations from a fixed mean).

Note that lme4 is an S4 class, so running names(re.lm) to explore the structure won’t be of any help. The summary function, however, gets us the overview we need.

> summary(re.lm)

Linear mixed model fit by REML

Formula: y ~ x + (1 + x | unit)

Data: test.df

AIC BIC logLik deviance REMLdev

1034 1059 -511.1 1013 1022

Random effects:

Groups Name Variance Std.Dev. Corr

unit (Intercept) 0.0040782 0.063861

x 0.01343480.115909-1.000

Residual 0.54275790.736721

Number of obs: 450, groups: unit, 30

Fixed effects:

Estimate Std. Error t value

(Intercept) 1.02645 0.03704 27.71

x3.055270.04059 75.28

Correlation of Fixed Effects:

(Intr)

x -0.099

The colors are to link the estimates to their true values based on β_{i} ~ N(**3**, **.2**^{2}) and

ε_{ij} ~ N(0, **.75**^{2}). Note that the ε error variance has an estimate much closer to its true value than the other two parameters. This shouldn’t come as a surprise; there are 15 times more opportunities (450 vs. 30) to observe variation in the random variable ε_{ij} than in β_{i}.

At 1.03, the true common intercept of 1.00 is recovered without much error, but what can lmer() say about whether the intercept is random? Since a random variable with zero variance is for all practical purposes a fixed constant, we want to test the hypothesis:

_{0}: σ

_{α}

^{2}= 0

versus the alternative

_{1}: σ

_{α}

^{2}> 0.

We can see that the estimated variance for the random intercept, at 0.004, is much less than that of the random slope, at 0.014. Due to the importance of the zero-variance hypothesis, I would have liked to see it included as part of the default output. But with a little extra work, we can search for evidence of positive variance. It starts by rerunning lmer() without the random intercept.

re.lm <- lmer(y ~ x + (0+x|unit), data = test.df)

summary(re.lm)

Linear mixed model fit by REML

Formula: y ~ x + (0 + x | unit)

Data: test.df

AIC BIC logLik deviance REMLdev

1031 1048 -511.7 1014 1023

Random effects:

Groups Name Variance Std.Dev.

unit x 0.013324 0.11543

Residual 0.547969 0.74025

Number of obs: 450, groups: unit, 30

Fixed effects:

Estimate Std. Error t value

(Intercept) 1.02755 0.03538 29.05

x 3.05594 0.04056 75.34

Correlation of Fixed Effects:

(Intr)

x 0.076

We could get a p-value by subtracting the log likelihoods, multiplying by two, and comparing to a χ^{2} with 1 df, but I’ll be lazy and notice that both the AIC and BIC are smaller for the second model, and that the log likelihood barely chaged at all (knowing the true answer helps too) . We conclude the intercept has zero variance and can be represented with a single fixed parameter.

Maybe it’s been a while since you’ve done a likelihood ratio test in a regression context, but *you’re not in least squares land anymore*. You’ll notice at the top of the output above that the model was fit by REML, a variation on maximum likelihood. A search for the maximum was conducted behind the scenes, and lme4’s “Computational Methods” vignette discusses the related complexities. A warning from Douglas Bates within the vignette “lmer for SAS PROC MIXED Users” (the SASmixed package, 2011): when using REML (the default), model comparisons based on AIC, BIC or the log likelihood should only be made when the competing models having identical fixed effect structures.

## The lme4 gain

Here “gain” refers the reduction of mean squared error from estimating the β_{i}s from lmer() rather than separate lm()s. The data was generated according to lmer()’s assumptions, so the question is, how much more information can we extract?

Before answering this question, we first need to decide what it means to “estimate” an individual β_{i}. Remember, these are random variables from the standpoint of lmer() and unknown fixed constants from the standpoint of lm().

The coef() function in lme4 returns the *posterior modes *of the β_{i}s. That is for a given i, knowledge of the overall distribution of β_{i}, which is N~ (3.06, .12^{2}) is updated via Bayes’ rule conditional on the values of y_{ij} and x_{ij} for j = 1,…,J. Below a few lines of output from the coef() function are shown.

coef(re.lm)

> coef(re.lm)

$unit

(Intercept) x

1 1.027547 3.093437

2 1.027547 3.016321

3 1.027547 3.120390

4 1.027547 3.030907

5 1.027547 2.947606

After fumbling with the structure above, I eventually managed to extract the posterior modes of β_{i} and store it in the vector re.beta.

re.beta <- coef(re.lm)$unit[,"x"]

Recall that the lm() estimates of β_{i} are stored in the vector beta.hat. We can plot them using the code below:

par(mfrow = c(1,1))

plot(re.beta ~ beta.hat, xlim = c(2.4, 3.6), ylim = c(2.4,3.6),

pch = 20, col = "blue", cex = 1.6, cex.axis = 1.2,

cex.lab = 1.5, xlab = expression(hat(beta)[i]),

ylab = "",

main = "Posterior modes versus least squares estimates")

title(ylab = expression(paste("mode of ", hat(italic(p)),

"(", beta[i], "|", bold(y)[i], ",", bold(x)[i],")"),

col.main = "red" ) , cex.lab = 1.5, mgp = c(2.4,0,0))

abline(h = 3.0, v = 3.0, col = "red", lty = 2)

abline(h = 3.05594, col = "grey", lty = 3)

The above graph contains (x,y) pairs of the form

_{i, }lmer estimate of β

_{i}).

Notice how there is more variation in the x-axis than the y-axis; l*mer’s estimates exhibit less variability than lm*.

Also notice the red dotted lines at 3.0, the true mean of the β_{i} population, and how the points fall relative to them. The estimates from lm() fall more or less symmetrically about the vertical line x = 3.0, whereas the estimates from lmer() tend to fall *above* the horizontal line y = 3.0. The light grey dotted line corresponds to the estimated mean of the β_{i}s by lmer(), which at 3.06 is slightly higher than the true value. The lmer() estimates are much more symmetrically distributed about *this* line, illustrating an important point: *lmer()’s estimates are **shrunk towards the population mean estimate*. This is a conditional bias given the population mean estimate. (I believe there is no bias unconditionally, but I’m a little rusty on my REML theory.)

To get an overall picture of both the variability and bias of the two estimators of β_{i} for our given sample, we turn to Mean Squared Error (MSE). Recall that MSE = Bias^{2} + Variance. The MSE of the two estimators can be computed with the following code:

Mean Squared Error for first pass linear models: 0.03892798

versus random effects modeling: 0.01715932

showing that lmer() is the clear winner.

But least-squares regression estimates are consistent and the β_{i}s will be estimated with increasing precision as the within-subject sample size J increases. Let’s see what happens as J is increased to 30 and then 100.**J = 30**

`Mean Squared Error for first pass linear models: 0.0148474 `

versus random effects modeling: 0.008529688

**J = 100**

Mean Squared Error for first pass linear models: 0.00569652

versus random effects modeling: 0.005613028

By J = 100, the superiority of lmer() to lm() is negligible.

## Discussion

We considered the random effects model in a very specific context. Still, we arrive at the following discussion points.

### Does using lmer() make one a bayesian?

My academic advisor Len Stefanski convinced me that simply using Bayes’ rule does not make one a bayesian, and I suppose that holds true for other tools bayesians use like shrinkage and Markov Chain Monte Carlo.

If those units indexed by i and described by β_{i} constitute a conceptually infinite population, then probability theory is still conceptually aligned with relative frequency, and the frequentists should be happy. When I studied this subject at NC State, in a frequentist context, much more emphasis was placed on the fixed parameters like the mean and variance of the β_{i}s than on improved estimates of the β_{i}s themselves.

### Particularities of lmer()

As for my own opinions, I would like to see tests for zero variance components, and to be able to fit a model with only fixed effects so that testing against a null model without any random effects is easier. I also could not find a predict() function available, so I don’t see how you can use lme4 to truly predict out of sample (i.e., to obtain a posterior estimate of β

_{i*}for a unit i* not used in estimation).

Overall, I believe lmer() is an excellent function within an excellent package. I hope it continues to be enhanced by the larger community.

### Is it worth it?

But if you have high measurement error, small secondary sample sizes, or both, then random effects analysis can really make a difference by sharing information among subjects and providing a stable target to shrink regression estimates towards. Also in these cases, the estimation and testing of variance components may have their own implications.

I’ll be continuing to look for my own “killer app” for random effects analysis. Please share if you have one of your own!

All highlighted R-code segments were created by Pretty R at inside-R.org. *Keep up with ours and other great articles relating to R on **R-bloggers, * and follow me on Twitter (@baogorek) for my latest research updates.

References

- Douglas Bates (2006). “[R] lemer, p-values and all that.” The R-help Archives. Date of posting:
*Fri May 19 22:40:27 CEST 2006.*URL https://stat.ethz.ch/pipermail/r-help/2006-May/094765.html (accessed June 10, 2012). - Douglas Bates, Martin Maechler and Ben Bolker (2011). lme4: Linear mixed-effects models using S4 classes. R package version 0.999375-42. URL http://CRAN.R-project.org/package=lme4
- M. L. Gumpertz and S. G. Pantula (1989) . “A simple approach to inferences in random coefficient models.”
*The American Statistician*, Volume 43, pp 203–210. - Original by Littel, Milliken, Stroup and Wolfinger modifications by Douglas Bates (2011). SASmixed: Data sets from “SAS System for Mixed Models”. R package version 1.0-1. URL http://CRAN.R-project.org/package=SASmixed
- Pretty R Syntax Highlighter. inside-R.org. URL http://www.inside-r.org/pretty-r (accessed June 10, 2012).
- R Development Core Team (2012). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org/

**leave a comment**for the author, please follow the link and comment on their blog:

**Anything but R-bitrary**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...