**Bluecology blog**, and kindly contributed to R-bloggers)

## Choosing R packages for mixed effects modelling based on the car you drive

There are many roads you can take to fit a mixed effects model

(sometimes termed hierarchical models) in `R`

. There are numerous

packages that each deploy different engines to fit mixed effects models.

In this driveability review, I look at some of the dominant packages for

mixed effects models The focus of this review is on the practicality of

each package, rather than their numerical accuracy. Information on

numerical accuracy can be found in the academic literature, see the

bibliography below. I will lookat each package’s handling, speed and

adaptability for tackling different types of questions.

*Summary of the packages strengths and weaknesses. Speeds are relative
to lme4 in a test case problem, see below for details.*

Parameter | `lme4` |
`rjags` |
`INLA` |
`RStan` |
---|---|---|---|---|

Relative speed | 1 | 100 | 10 | 60 |

Handling- ease of coding^{1} |
Straightforward | Hard | Moderate | Hard |

User manual | Accessible | Brief and technical | Very technical^{2} |
Highly accessible |

Online documentation | Extensive | Extensive | Nascent | Nascent |

Support from developers^{3} |
Good | Good | Outstanding | Good |

Testing for accuracy | Extensive | Extensive for some types of models | Nascent | Nascent |

Adaptability – types of models you can fit | Limited | Your imagination is the limit | Extensive | Almost as much as JAGS |

Best for | Simple problems, quick implementation | Customising complex models | Spatio-temporal models | Developing new types of models |

How you will look driving it | Traditional, but elegant | You like to take the scenic route | Edgy and slightly mysterious | Hanging with the cool kids |

- but see below for some add-on packages which make life easier for

STAN and JAGS - = lots of equations
- In my experience. But, heck these packages are all free and all give better user support than many

paid services!

The first is

`lme4`

,

meaning linear mixed effects models with `S4`

classes. `lme4`

is like an

older model sports-car – fast, respectable, well known and able to

handle common types of questions. `lme4`

uses maximum likelihood methods

for estimation, so interpret its statistics as a

frequentist.

The next three packages are Bayesian techniques.

The second package is

`rjags`

,

which is an interface to the “Just Another Gibbs

Sampler” program. `rjags`

is the

Land-Rover of packages for mixed effects models: well tested, able to

tackle just about any terrian, but slow and prone to breaking down

(usually a user error, rather than the package itself).

The third package is `INLA`

, standing for

“Integrated Nested Laplace Approximation”. `INLA`

is like an electric

sports car: it is a modern technique that is lightning fast, but there

has been limited testing to date and you better understand what is under

the hood before you attempt to drive it.

The final package is

`RStan`

,

which is the `R`

interface to the Stan modelling

language. `RStan`

is like like BMW’s hybrid,

combining computational speed with flexibility and with exceptional

documentation and support.

Now, let’s look at each packages computational speed, handling (ease of

use), driver support (documentation, existing knowledge and forums) and

flexibility.

Skip to the end for a biblography of examples and some cool packages

that build on these tools.

### A note about philosophy

The frequentist and Bayesian approaches I review here come from

fundamentally different statistical

philosophies.

Here I will tackle the approaches pragmatically, looking at estimation

speed and so on, however you might exclude one or the other a-prior

based on your bent towards Bayesian or frequentist thinking.

### Speed

I simulated a simple hierarchical data-set to test each of the models.

The script is available here. The

data-set has 100 binary measurements. There is one fixed covariate

(continuous ) and one random effect with five groups. The linear

predictor was transformed to binomial probabilities using the logit

function. For the Bayesian approaches, slightly different priors were

used for each package, depending on what was available. See the script

for more details on priors.

For each package I fit the same linear model with binomial errors and a

logit link. To fit the models I used the functions `glmer()`

from

`lme4`

, `inla()`

from `INLA`

, custom code for `rjags`

and the

`stan_glmer()`

from the package

`rstanarm`

to fit the Stan model (you can also write your own custom code for

`RStan`

, but for convenience I used the canned version of mixed effects

models).

*Computational speeds for fitting a binomial Generalised Linear Model to
100 samples with one random and one fixed effect. Relative times are
relative to *

`lme4`

Measurement | `lme4` |
`rjags` |
`INLA` |
`RStan` |
---|---|---|---|---|

Speed (Seconds) |
0.063 | 6.2 | 0.59 | 3.8 |

Speed (relative) |
1 | 99 | 9.4 | 60 |

So `lme4`

is by far the fastest, clocking in at around 6 tenths of a

second (on my 3GHz Intel i7 Macbook Pro). `INLA`

comes in second, ten

times slower than `lme4`

, then followed an order of magnitude later by

`RStan`

and finally `rjags`

came plodding along behind about 100 times

slower than `lme4`

.

If you run the code I have provided you will see that all packages come

up with similar estimates for the model parameters.

### Handling – ease of use

`lme4`

leads the pack when it comes to ease of use. If you have your

data sorted, then you can go from zero to fitted model in one line of

code. `INLA`

is similar, however you should generally not rely on

default priors but specify your own, which requires additional code and

thinking.

Use of `RStan`

and `rjags`

depends on how you use them. You can write

your own model code, which can lead to quite lengthy scripts, but also

gives you greater number of choices for how you design your model.

However, there are also many packages that provide wrapper’s to `RStan`

and `rjags`

that will fit common types of models (like mixed effects

models). These tend to be as simple as `INLA`

’s functions, remembering

that you should think carefully about prior

choice.

If you are going to write your own models, you will need to get good at

debugging, so there is an extra overhead there.

**Model checking** is another important consideration.

For users familiar with GLMs, `lme4`

probably provides the most familiar

support for model checking. `INLA`

can be more difficult to use, because

it requires the user to hand-code many model checking routines. `INLA`

,

`rjags`

and `RStan`

also require the user to carefully check that the

fitting algorithms have performed properly.

Each of these packages use different types of algorithms, so the

statistics you will use to check algorithm performance are different.

See each package’s documentation for further advice.

### Driver support – documentation and support

`lme4`

and `RStan`

have the highest quality user-manuals. Both are well

written and provide numerous examples that users will find helpful.

The user manual for `rjags`

and JAGS is somewhat brief, but the user can

easily find many helpful examples on the web, and these packages are

covered by numerous books. So if you are willing to broaden your search

you should have no trouble finding help.

`RStan`

and `INLA`

are both relatively new, so fewer examples can be

found on the web. While `RStan`

has an excellent manual for both the

mathsy and non-mathsy types, documentation of `INLA`

can be difficult to

follow for novice users, because much of it is equations.

**Web support and chat groups** are also an important aspect of model

development. Help for all packages can be found on sites like

Stackoverflow, or you

can post your own new questions there.

Personally, I have found that support for `INLA`

, through their

forum,

is exceptional. On several occaisons I have posted questions for which I

could not find documentation and have generally recieved a response from

`INLA`

’s developers before the end of the working day. This is amazing,

because it is better than you would expect from many paid services.

I have not participated in forums for the other packages, so I can’t add

comments here.

One final note on support is that `lme4`

and JAGS are both pretty well

studied approaches to modelling and there are numerous academic papers

dealing with their biases and accuracy. `INLA`

and `RStan`

are

relatively new on the scene, so for new and different types of problems

you might want to run your own simulation study to check that your model

is performing as expected.

### Adaptability – ability to tackle different types of problems

Your imagination is the limit with `rjags`

– take it on well driven

routes, like a mixed effects model, or off-road on new adventures with

Bayesian structural equation modelling – it can do it all.

`rjags`

is followed closely by `RStan`

. `RStan`

has a few limitations,

but can basically do anything JAGS can do, and often faster (e.g. this

ecological

study).

`RStan`

will probably eventually replace `rjags`

, but to date `rjags`

persists because of the extensive range of well documented examples

users can build on.

`INLA`

comes in fourth with a diverse range of built in likelihoods and

model functions (you can even ask the developers to add more and they

might do it!). `INLA`

is becoming particularly popular for modelling

spatial and temporal auto-correlation, partly due to its speed. You can

do these types of models with `RStan`

and `rjags`

but computations might

be impossibly slow.

`lme4`

is the least flexible of packages – though there are some options

to customise it’s models. The similar `nlme`

package also provides a

range of tools for fitting random effects for spatial and temporal

autocorrelation.

There are however, some clever tricks you can do with `lme4`

to fit a

broader range of models. For instance, you can include a random effect

where every sample is a separate group in a Poisson GLM to get a Poisson

with extra variation.

### Conclusion

That’s all. I hope this blog helps you make a good choice before

investing in learning a new tool for mixed effects models. Let me

know how you go and if you

found this useful.

### Cool applications of mixed effects models in R

rstanarm for mixed effects models coded like `glm()`

but using Stan for estimation.

glmmBUGS for mixed effects models coded like `glm()`

but using JAGS for estimation.

A new function in the mgcv package (links to

pdf) for

auto-writing code for Bayesian Generalised Additive Models.

A new study showing how you can

fit spatial models with barriers.

Fit Bayesian Latent Variable

models for

analysing multivariate data (e.g. ecological communities).

### Bibliography

Unless otherwise indicated, these resources are open access.

Stan user manual.

JAGS

user manual.

INLA project page.

INLA

forum.

Mixed Effects Models and Extensions in Ecology with

R my go-to book for

theory and code for fitting likelihood based mixed effects models (e.g.

with `lme4`

). You will have to buy it.

A good example for how to design a simulation study to test your

models

Example for INLA for spatial

models

in ecology.

A review of mixed effects models in fisheries

science

(good for other disciplines too).

NIMBLE another R package for Bayesian

modelling.

glmmADMB another R

package giving greater flexibility for fitting using maximum likelihood.

**leave a comment**for the author, please follow the link and comment on their blog:

**Bluecology blog**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...