Need to do a simulation study on a Bayesian model? Use Stan.

[This article was first published on Robert Grant's stats blog » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I’ve been looking into a particular Bayesian meta-analysis model of late. Can’t tell you any more right now of course, but I wanted to check that it was throwing up sensible results and then compare it to classical MA methods. Bring out the simulation study! The trouble is, if you run this sort of thing in BUGS or JAGS, especially because there are correlated parameters between the baseline and endpoint, and between the overall and study-specific stats, it’s going to be slow to fit one of the models, never mind 10,000 of them.

I switched to Stan once I got the basic spec up and running, and it was a decision I have not regretted for one moment. The advantage here is that, not only does Stan use an algorithm that will easily cope with highly correlated parameters (see figure below), but it is compiled into a machine-code program which takes your data and runs the NUTS algorithm to get the chains. So, one you’ve invested the time (about 30-60 seconds for me) to compile the program, plugging in new data to the same model is super-fast. We’re talking a tenth of a second or just under for 1000 steps. So to run 10,000 simulated datasets through it, with 2 chains each of 1000 steps, would be a bit of a nightmare in BUGS, but takes 30 minutes in Stan. Now things that were prohibitively time-consuming become possible!

Figure 7 from Hoffman & Gelman's 2011 paper introducing NUTS

Figure 7 from Hoffman & Gelman’s 2011 paper introducing NUTS: note how NUTS gets much closer to the true distribution (under “independent”) of these 2 (out of 250!) correlated dimensions in parameter space than Metropolis or Gibbs in the same number of steps.

The R interface “rstan” is really good too, so I can do everything in one R code file: generate the data, run the model, get the results, look for non-convergence, work out bias, coverage and MC error, and draw some graphs. All while I go and enjoy a cappuccino. I think it’s fair to say that Stan is my new best friend in the stats playground.

To leave a comment for the author, please follow the link and comment on their blog: Robert Grant's stats blog » R. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)