Choosing R packages for mixed effects modelling based on the car you drive

April 13, 2017
By

(This article was first published on Bluecology blog, and kindly contributed to R-bloggers)

Choosing R packages for mixed effects modelling based on the car you drive

There are many roads you can take to fit a mixed effects model
(sometimes termed hierarchical models) in R. There are numerous
packages that each deploy different engines to fit mixed effects models.

In this driveability review, I look at some of the dominant packages for
mixed effects models The focus of this review is on the practicality of
each package, rather than their numerical accuracy. Information on
numerical accuracy can be found in the academic literature, see the
bibliography below. I will lookat each package’s handling, speed and
adaptability for tackling different types of questions.

Summary of the packages strengths and weaknesses. Speeds are relative
to lme4 in a test case problem, see below for details.

Parameter lme4 rjags INLA RStan
Relative speed 1 100 10 60
Handling- ease of coding1 Straightforward Hard Moderate Hard
User manual Accessible Brief and technical Very technical2 Highly accessible
Online documentation Extensive Extensive Nascent Nascent
Support from developers3 Good Good Outstanding Good
Testing for accuracy Extensive Extensive for some types of models Nascent Nascent
Adaptability – types of models you can fit Limited Your imagination is the limit Extensive Almost as much as JAGS
Best for Simple problems, quick implementation Customising complex models Spatio-temporal models Developing new types of models
How you will look driving it Traditional, but elegant You like to take the scenic route Edgy and slightly mysterious Hanging with the cool kids
  1. but see below for some add-on packages which make life easier for
    STAN and JAGS
  2. = lots of equations
  3. In my experience. But, heck these packages are all free and all give better user support than many
    paid services!

The first is
lme4,
meaning linear mixed effects models with S4 classes. lme4 is like an
older model sports-car – fast, respectable, well known and able to
handle common types of questions. lme4 uses maximum likelihood methods
for estimation, so interpret its statistics as a
frequentist.

The next three packages are Bayesian techniques.

The second package is
rjags,
which is an interface to the “Just Another Gibbs
Sampler”
program. rjags is the
Land-Rover of packages for mixed effects models: well tested, able to
tackle just about any terrian, but slow and prone to breaking down
(usually a user error, rather than the package itself).

My take on reviewing R packages for Generalised Linear Mixed Effects. Clockwise from top left: The `lme4` package is fast but a bit older, like an early noughties Ferrari; the JAGS program goes anywhere like a Range Rover, but breaks often and is slow; the Stan language is modern and efficient like a hybrid BMW and INLA is fast and modern but requires technical expertise like a Tesla.

The third package is INLA, standing for
“Integrated Nested Laplace Approximation”. INLA is like an electric
sports car: it is a modern technique that is lightning fast, but there
has been limited testing to date and you better understand what is under
the hood before you attempt to drive it.

The final package is
RStan,
which is the R interface to the Stan modelling
language
. RStan is like like BMW’s hybrid,
combining computational speed with flexibility and with exceptional
documentation and support.

Now, let’s look at each packages computational speed, handling (ease of
use), driver support (documentation, existing knowledge and forums) and
flexibility.

Skip to the end for a biblography of examples and some cool packages
that build on these tools.

A note about philosophy

The frequentist and Bayesian approaches I review here come from
fundamentally different statistical
philosophies
.
Here I will tackle the approaches pragmatically, looking at estimation
speed and so on, however you might exclude one or the other a-prior
based on your bent towards Bayesian or frequentist thinking.

Speed

I simulated a simple hierarchical data-set to test each of the models.
The script is available here. The
data-set has 100 binary measurements. There is one fixed covariate
(continuous ) and one random effect with five groups. The linear
predictor was transformed to binomial probabilities using the logit
function. For the Bayesian approaches, slightly different priors were
used for each package, depending on what was available. See the script
for more details on priors.

For each package I fit the same linear model with binomial errors and a
logit link. To fit the models I used the functions glmer() from
lme4, inla() from INLA, custom code for rjags and the
stan_glmer() from the package
rstanarm
to fit the Stan model (you can also write your own custom code for
RStan, but for convenience I used the canned version of mixed effects
models).

Computational speeds for fitting a binomial Generalised Linear Model to
100 samples with one random and one fixed effect. Relative times are
relative to lme4

Measurement lme4 rjags INLA RStan
Speed (Seconds) 0.063 6.2 0.59 3.8
Speed (relative) 1 99 9.4 60

So lme4 is by far the fastest, clocking in at around 6 tenths of a
second (on my 3GHz Intel i7 Macbook Pro). INLA comes in second, ten
times slower than lme4, then followed an order of magnitude later by
RStan and finally rjags came plodding along behind about 100 times
slower than lme4.

If you run the code I have provided you will see that all packages come
up with similar estimates for the model parameters.

Handling – ease of use

lme4 leads the pack when it comes to ease of use. If you have your
data sorted, then you can go from zero to fitted model in one line of
code. INLA is similar, however you should generally not rely on
default priors but specify your own, which requires additional code and
thinking.

Use of RStan and rjags depends on how you use them. You can write
your own model code, which can lead to quite lengthy scripts, but also
gives you greater number of choices for how you design your model.
However, there are also many packages that provide wrapper’s to RStan
and rjags that will fit common types of models (like mixed effects
models). These tend to be as simple as INLA’s functions, remembering
that you should think carefully about prior
choice
.

If you are going to write your own models, you will need to get good at
debugging, so there is an extra overhead there.

Model checking is another important consideration.

For users familiar with GLMs, lme4 probably provides the most familiar
support for model checking. INLA can be more difficult to use, because
it requires the user to hand-code many model checking routines. INLA,
rjags and RStan also require the user to carefully check that the
fitting algorithms have performed properly.

Each of these packages use different types of algorithms, so the
statistics you will use to check algorithm performance are different.
See each package’s documentation for further advice.

Driver support – documentation and support

lme4 and RStan have the highest quality user-manuals. Both are well
written and provide numerous examples that users will find helpful.

The user manual for rjags and JAGS is somewhat brief, but the user can
easily find many helpful examples on the web, and these packages are
covered by numerous books. So if you are willing to broaden your search
you should have no trouble finding help.

RStan and INLA are both relatively new, so fewer examples can be
found on the web. While RStan has an excellent manual for both the
mathsy and non-mathsy types, documentation of INLA can be difficult to
follow for novice users, because much of it is equations.

Web support and chat groups are also an important aspect of model
development. Help for all packages can be found on sites like
Stackoverflow, or you
can post your own new questions there.

Personally, I have found that support for INLA, through their
forum
,
is exceptional. On several occaisons I have posted questions for which I
could not find documentation and have generally recieved a response from
INLA’s developers before the end of the working day. This is amazing,
because it is better than you would expect from many paid services.

I have not participated in forums for the other packages, so I can’t add
comments here.

One final note on support is that lme4 and JAGS are both pretty well
studied approaches to modelling and there are numerous academic papers
dealing with their biases and accuracy. INLA and RStan are
relatively new on the scene, so for new and different types of problems
you might want to run your own simulation study to check that your model
is performing as expected.

Adaptability – ability to tackle different types of problems

Your imagination is the limit with rjags – take it on well driven
routes, like a mixed effects model, or off-road on new adventures with
Bayesian structural equation modelling – it can do it all.

rjags is followed closely by RStan. RStan has a few limitations,
but can basically do anything JAGS can do, and often faster (e.g. this
ecological
study
).
RStan will probably eventually replace rjags, but to date rjags
persists because of the extensive range of well documented examples
users can build on.

INLA comes in fourth with a diverse range of built in likelihoods and
model functions (you can even ask the developers to add more and they
might do it!). INLA is becoming particularly popular for modelling
spatial and temporal auto-correlation, partly due to its speed. You can
do these types of models with RStan and rjags but computations might
be impossibly slow.

lme4 is the least flexible of packages – though there are some options
to customise it’s models. The similar nlme package also provides a
range of tools for fitting random effects for spatial and temporal
autocorrelation.

There are however, some clever tricks you can do with lme4 to fit a
broader range of models. For instance, you can include a random effect
where every sample is a separate group in a Poisson GLM to get a Poisson
with extra variation.

Conclusion

That’s all. I hope this blog helps you make a good choice before
investing in learning a new tool for mixed effects models. Let me
know
how you go and if you
found this useful.

Cool applications of mixed effects models in R

rstanarm for mixed effects models coded like glm() but using Stan for estimation.

glmmBUGS for mixed effects models coded like glm() but using JAGS for estimation.

A new function in the mgcv package (links to
pdf)
for
auto-writing code for Bayesian Generalised Additive Models.

A new study showing how you can
fit spatial models with barriers.

Fit Bayesian Latent Variable
models
for
analysing multivariate data (e.g. ecological communities).

The GLMM in
R FAQ

Bibliography

Unless otherwise indicated, these resources are open access.

Stan user manual.

JAGS
user manual.

Introduction to lme4
(pdf)
.

INLA project page.

INLA
forum.

Mixed Effects Models and Extensions in Ecology with
R
my go-to book for
theory and code for fitting likelihood based mixed effects models (e.g.
with lme4). You will have to buy it.

A good example for how to design a simulation study to test your
models

Example for INLA for spatial
models

in ecology.

A review of mixed effects models in fisheries
science

(good for other disciplines too).

NIMBLE another R package for Bayesian
modelling.

glmmADMB another R
package giving greater flexibility for fitting using maximum likelihood.

To leave a comment for the author, please follow the link and comment on their blog: Bluecology blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)