Hierarchical models with RStan (Part 1)

November 10, 2016

(This article was first published on R – biologyforfun, and kindly contributed to R-bloggers)

Real-world data sometime show complex structure that call for the use of special models. When data are organized in more than one level, hierarchical models are the most relevant tool for data analysis. One classic example is when you record student performance from different schools, you might decide to record student-level variables (age, ethnicity, social background) as well as school-level variable (number of student, budget). In this post I will show how to fit such models using RStan. As there is much to say and try on such models I restrict myself in this post to a rather simple example, I will extend this to more complex situations in latter posts.

A few words about RStan:

If you don’t know anything about STAN and RStan make sure to check out this webpage. In a few words RStan is an R interface to the STAN programming language that let’s you fit Bayesian models. A classical workflow looks like this:

  1. Write a STAN model file ending with a .stan
  2. In R fit the model using the RStan package passing the model file and the data to the stan function
  3. Check model fit, a great way to do it is to use the shinystan package

First example with simulated data:

Say that we recorded the response of 10 different plant species to rising temperature and nitrogen concentration. We measured the biomass of 10 individuals per species along a gradient of both temperature and nitrogen concentration and we would like to know how these two variables affect plant biomass. In hierarchical model we let regression parameters vary between the species, this means that, for example, species A might have a more positive slope between temperature and biomass than species B. Note however that we do not fit separate regression to each species, rather the regression parameters for the different species are themselves being fitted to a statistical distribution.

In mathematical terms this example can be written:

\mu_{ij} = \beta_{0j} + \beta_{1j} * Temperature_{ij} + \beta_{2j} * Nitrogen_{ij}

\beta_{0j} \sim N(\gamma_{0},\tau_{0})

… (same for all regression coefficients) …

y_{ij} \sim N(\mu_{ij},\sigma)

For observations i: 1 … N and species j: 1 … J.

This is how such a model looks like in STAN:

/*A simple example of an hierarchical model*/
data {
  int N; //the number of observations
  int J; //the number of groups
  int K; //number of columns in the model matrix
  int id[N]; //vector of group indeces
  matrix[N,K] X; //the model matrix
  vector[N] y; //the response variable
parameters {
  vector[K] gamma; //population-level regression coefficients
  vector[K] tau; //the standard deviation of the regression coefficients

  vector[K] beta[J]; //matrix of group-level regression coefficients
  real sigma; //standard deviation of the individual observations
model {
  vector[N] mu; //linear predictor
  gamma ~ normal(0,5); //weakly informative priors on the regression coefficients
  tau ~ cauchy(0,2.5); //weakly informative priors, see section 6.9 in STAN user guide
  sigma ~ gamma(2,0.1); //weakly informative priors, see section 6.9 in STAN user guide
  for(j in 1:J){
   beta[j] ~ normal(gamma,tau); //fill the matrix of group-level regression coefficients 
  for(n in 1:N){
    mu[n] = X[n] * beta[id[n]]; //compute the linear predictor using relevant group-level regression coefficients 

  y ~ normal(mu,sigma);

You can copy/paste the code into an empty text editor and save it under a .stan file.

Now we turn into R:

#load libraries
#where the STAN model is saved
#simulate some data
N<-100 #sample size
J<-10 #number of plant species
id<-rep(1:J,each=10) #index of plant species
K<-3 #number of regression coefficients
#population-level regression coefficient
#standard deviation of the group-level coefficient
#standard deviation of individual observations
#group-level regression coefficients
beta<-mapply(function(g,t) rnorm(J,g,t),g=gamma,t=tau) 
#the model matrix
y<-vector(length = N)
for(n in 1:N){
  #simulate response data
#run the model

The MCMC sampling takes place (took about 90 sec per chain on my computer), and then I got this warning message: “Warning messages:
1: There were 61 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
2: Examine the pairs() plot to diagnose sampling problems”

Here is an explanation for this warning: “For some intuition, imagine walking down a steep mountain. If you take too big of a step you will fall, but if you can take very tiny steps you might be able to make your way down the mountain, albeit very slowly. Similarly, we can tell Stan to take smaller steps around the posterior distribution, which (in some but not all cases) can help avoid these divergences.” from here. This issue occur quite often in hierarchical model with limited sample size, the simplest solution being to re-parameterize the model, in other words to re-write the equations so that the MCMC sampler has an easier time sampling the posterior distribution.

Below is a new STAN model with a non-centered parameterization (See Section 22.6 in STAN user guide):

parameters {
  vector[K] gamma; //population-level regression coefficients
  vector[K] tau; //the standard deviation of the regression coefficients
  //implementing Matt's trick
  vector[K] beta_raw[J];
  real sigma; //standard deviation of the individual observations
transformed parameters {
  vector[K] beta[J]; //matrix of group-level regression coefficients
  //computing the group-level coefficient, based on non-centered parametrization based on section 22.6 STAN (v2.12) user's guide
  for(j in 1:J){
    beta[j] = gamma + tau .* beta_raw[j];
model {
  vector[N] mu; //linear predictor
  gamma ~ normal(0,5); //weakly informative priors on the regression coefficients
  tau ~ cauchy(0,2.5); //weakly informative priors, see section 6.9 in STAN user guide
  sigma ~ gamma(2,0.1); //weakly informative priors, see section 6.9 in STAN user guide
  for(j in 1:J){
   beta_raw[j] ~ normal(0,1); //fill the matrix of group-level regression coefficients 
  for(n in 1:N){
    mu[n] = X[n] * beta[id[n]]; //compute the linear predictor using relevant group-level regression coefficients 
  y ~ normal(mu,sigma);

Note that the data model block is identical in the two cases.

We turn back to R:

#re-parametrize the model
#no more divergent iterations, we can start exploring the model
#a great way to start is to use the shinystan library
#model inference
Inference for Stan model: hierarchical1_reparam.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

          mean se_mean   sd  2.5%   25%   50%  75% 97.5% n_eff Rhat
gamma[1]  1.96    0.00 0.17  1.61  1.86  1.96 2.07  2.29  2075    1
gamma[2] -0.03    0.02 0.77 -1.53 -0.49 -0.04 0.43  1.55  1047    1
gamma[3]  2.81    0.02 0.49  1.84  2.52  2.80 3.12  3.79   926    1
tau[1]    0.34    0.01 0.21  0.02  0.19  0.33 0.46  0.79  1135    1
tau[2]    2.39    0.02 0.66  1.47  1.94  2.26 2.69  4.04  1234    1
tau[3]    1.44    0.01 0.41  0.87  1.16  1.37 1.65  2.43  1317    1
sigma     1.04    0.00 0.09  0.89  0.98  1.04 1.10  1.23  2392    1

Samples were drawn using NUTS(diag_e) at Thu Nov 10 14:11:41 2016.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

The regression parameters were all decently estimated except for the second slope coefficient, the simulated value was -1.

All MCMC samples for all coefficient can be easily extracted and used to compute whatever your interest is:

#extract the MCMC samples

#plot average response to explanatory variables
#get predicted values for each MCMC sample
pred_x1<-apply(mcmc_hier$gamma,1,function(beta) X_new %*% beta)
#now get median and 95% credible intervals
#same stuff for the second explanatory variables
pred_x2<-apply(mcmc_hier$gamma,1,function(beta) X_new %*% beta)

Here we basically generated new model matrices where only one variable was moving at a time, this allowed us to get the model prediction for the effect of say temperature on plant biomass under average nutrient conditions. These predictions were obtained by multiplying the model matrix with the coefficients for each MCMC sample (the first apply command), and then we can get from these samples the median with 95% credible intervals (the second apply command).

Now we can plot this (code for the plots at the end of the post)


Another important plot is the variation in the regression parameters between the species, again this is easily done using the MCMC samples:

#now we could look at the variation in the regression coefficients between the groups doing caterpillar plots
#we may also add the population-level median estimate
  labs(y="Regression parameters")


The cool thing with using STAN is that we can extend or modify the model in many ways. This will be the topics of future posts which will include: crossed and nested design, multilevel modelling, non-normal distributions and much more, stay tuned!

Code for the first plot:

plot(y~X[,2],pch=16,xlab="Temperature",ylab="Response variable",col=cols[id])
plot(y~X[,3],pch=16,xlab="Nitrogen concentration",ylab="Response variable",col=cols[id])
mtext(text = "Population-level response to the two\nexplanatory variables with 95% CrI",side = 3,line = 0,outer=TRUE)
legend(x=2.1,y=10,legend=paste("Gr",1:10),ncol = 1,col=cols,pch=16,bty="n",xpd=NA,title = "Group\nID")


Filed under: Informatic Language, R and Stat Tagged: Bayesian, R, STAN, Statistics

To leave a comment for the author, please follow the link and comment on their blog: R – biologyforfun.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)