Making sense of random effects

November 16, 2012
By

(This article was first published on distributed ecology, and kindly contributed to R-bloggers)

The other night in my office I got into a discussion with my office mate, the brilliant scientist / amazing skier Dr. Thor Veen about how to understand the random effect variance term in a mixed-effects model. Thor teaches the R statistics course here at UBC, and last night a student came to the office to ask a question about how to interpret that returned from a mixed model object (in this case lmer from the package lme4. The question surrounded a dataset where individual stickleback fish had been measured for a trait at different light wavelengths. Because the individual fish had been measured multiple times, a mixed-model was fit with a fixed factor for wavelength and a random effect of individual fish. In this case the random effects variance term came back as 0 (or very close to 0), despite there appearing to be variation between individuals. The quesiton is under what circumstances do we expect that variance to increase, and how do we interpret it vs the residual variance. I’ll illustrate this with two simulated data sets.

For the first example I generated some data where I imagine that same nine individuals (random effect) were measured at five different levels of some treatment (fixed effect). In the first data set I include strong individual effects. Here’s a plot of the data:

Now we can fit this model using lmer and look at the variance term

The key component here is `ind (Intercept)` term and the residual variance. It should be around 7, and much higher than the residual variance. We can see how much better our fit is compared to a fit that ignores individual effects with AIC.

Clearly the mixed model is a much better fit because it has a much lower AIC (`84.8383` for the mixed model vs `227.1915` for the model ignoring individual effects)

But what happens when you keep the exact same levels of variance within each treatment level, but randomize the individuals. This means that the same amount of variance is there between individuals at each level, but the individuals no longer vary consistently across treatment levels.

Now let’s fit the same set of models

When you examine the variance in the individual random effect, it should be close to 0 or 0, with all the variance in the residual term now. Also, the fit between a mixed-model vs a normal ANOVA should be almost the same when we look at AIC (`220.9788` for the mixed model vs `227.1915` for the model ignoring individual effects)

The variance in random factor tells you how much variability there is between individuals across all treatments, not the level of variance between individuals within each group. If you compare the total variance between the strong indivdual effects vs the randomized data set, they have the same variance, the difference is in how it’s partitioned. In the case the randomized data, the residual variance is telling you how much variability there is within a treatment, and the variance for the random effect of indivdual tells you how much of that within treatment variance is explained by individual differences. However if individuals don’t vary consistently across treatments, that term will approach 0, and at the very least be less than the residual term.

I’m actually sort of anti-mixed model, probably because I’ve come up learning stats from reading Andrew Gelman and he has rather strong opinions on the matter. I think it’s often easier to just understand everything in terms of random effects and look at effect sizes. However I’m probably in the minority in ecology when it comes to that view. I hope this helps some folks get a better understanding of understanding the random effects term in mixed models. You can see my full code at a gist where you can see how I generated the data and play around with it yourself.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...