Stan for Bayesian Analysis

Posted on October 23, 2012 by Daniel Hocking in R bloggers | 0 Comments

[This article was first published on Daniel Hocking, Ecologist » Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Bayesian analysis has been growing in popularity among ecologists recently, largely due to accessible books such as Models for Ecological Data: An Introduction, Introduction to WinBUGS for Ecologists, and Bayesian Methods for Ecology. Most ecologists with limited programming background have chosen to implement Bayesian analyses using WinBUGS, OpenBUGS, or JAGS, all of which can be run through R packages. JAGS is the best of these, works on most operating systems, and is flexible enough for most Bayesian analyses in ecology. However, recently due to inherent inefficiencies in these programs, which all implement MCMC using a Gibbs sampler (version of the Metropolis-Hastings Algorithm), a group of statisticians and computer programmers got together to solve these problems. The result was Stan.

Stan uses a version of a Hamiltonian Monte Carlo, specifically the No U-Turn (NUTS) sampler. NUTS in combination with Stan being written in C++ makes it faster than the above-mentioned Gibbs samplers, particularly for complex problems. Models should converge faster and in cases when Gibbs samplers can’t converge at all, Stan might provide a solution. For those of us who don’t work in C++ there is a bit of a learning curve but there is a nice R package (rstan) for implementation, a great manual, and a very active users group.

I decided to go through Marc Kery’s excellent book, and run all of the examples using Stan via rstan for comparison. Apparently, I am not the only one. You can find an example from Chapter 7 (t-test) on Shige’s useful blog. I recently finished translating chapter 8 into Stan code. Running the model was relatively easy, but it took me a while to figure out how to do all the post-processing model assessment (e.g. plots, posterior predictive checks, etc.). Now that I’ve done it though, it should be fairly easy to repeat. You can find the code below for running the model for Chapter 8, exercise 3 below and the full code for the chapter and exercises as a word document here: Kery_Chp8_Stan

bm8_3 <- '
data{
int n;
vector[n] y;
vector[n] x;
}

parameters{ 
real alpha; // these will be monitored and saved
real beta;
real sigma;
}

model{
vector[n] mu; // mu is defined in the model so won't be reported

// Priors
alpha ~ normal(0, 100);
beta ~ normal(0, 100);
sigma ~ uniform(0, 30);

// Likelihood
mu y ~ normal(mu, sigma);
}
'
data_8_3 <- list(n = length(mmd.grass$year), y = mmd.grass$mmd, 
                 x = mmd.grass$year)

fit.8.3 <- stan(model_code = bm8_3, data = data_8_3,
  chains = 3,
  iter = 50000,
  warmup = 25000,
  thin = 1)

print(fit.8.3, dig = 3)

traceplot(fit.8.3)

Filed under: Uncategorized