R, Stan and Bayesian Statistics

June 23, 2016
By

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

by Joseph Rickert

Just about two and a half years ago I wrote about some resources for doing Bayesian statistics in R. Motivated by the tutorial Modern Bayesian Tools for Time Series Analysis by Harte and Weylandt that I attended at R/Finance last month, and the upcoming tutorial An Introduction to Bayesian Inference using R Interfaces to Stan that Ben Goodrich is going to give at useR! I thought I'd look into what's new. Well, Stan is what's new! Yes, Stan has been under development and available for some time. But somehow, while I wasn't paying close attention, two things happened: (1) the rstan package evolved to make the mechanics of doing Bayesian in R analysis really easy and (2) the Stan team produced and/or organized an amazing amount of documentation. 

My impressions of doing Bayesian analysis in R were set in the WinBUGS era. The separate WinBUGs installation was always tricky, and then moving between the BRugs and R2WinBUGS packages presented some additional challenges. My recent Stan experience was nothing like this. I had everything up and running in just a few minutes. The directions for getting started with rstan are clear and explicit about making sure that you have the right tool chain in place for your platform. Since I am running R 3.3.0 on Windows 10 I installed Rtools34. This went quickly and as expected except that C:\Rtools\gcc-4.x-y\bin did not show up in my path variable. Not a big deal: I used the menus in the Windows System Properties box to edit the Path statement by hand. After this, rstan installed like any other R package and I was able to run the 8schools example from the package vignette. The following 10 minute video by Ehsan Karim takes you through the install process and the vignette example.

 

The Stan documentation includes four major components: (1) The Stan Language Manual, (2) Examples of fully worked out problems, (3) Contributed Case Studies and (4) both slides and video tutorials. This is an incredibly rich cache of resources that makes a very credible case for the ambitious project of teaching people with some R experience both Bayesian Statistics and Stan at the same time. The "trick" here is that the documentation operates at multiple levels of sophistication with entry points for students with different backgrounds. For example, a person with some R and the modest statistics background required for approaching Gelman and Hill's extraordinary text: Data Analysis Using Regression and Multilevel/Hierarchical Models can immediately beginning running rstan code for the book's examples. To run the rstan version of the example in section 5.1, Logistic Regression with One Predictor, with no changes a student only needs only to copy the R scripts and data into her local environment. In this case, she would need the R script:  5._LogisticRegressionWithOnePredictor. R, the data: nes1992_vote.data.R and the Stan code: nes_logit.stan.  The Stan code for this simple model is about as straightforward as it gets: variable declarations, parameter identification and the model itself.

data {
  int<lower=0> N;
  vector[N] income;
  int<lower=0,upper=1> vote[N];
  }
  parameters {
  vector[2] beta;
  }
  model {
  vote ~ bernoulli_logit(beta[1] + beta[2] * income);
  }

Running the script will produce the iconic logistic regression plot:

G_H_5-1

I'll wind down by curbing my enthusiasm just a little by pointing out that Stan is not the only game in town. JAGS is a popular alternative, and there is plenty that can be done with unaugmented R code alone as the Bayesian Inference Task View makes abundantly clear. 

If you are a book person and new to Bayesian statistics, I highly recommend Bayesian Essentials with R by Jean-Michel Marin and Christian Robert. The authors provide a compact introduction to Bayesian statistics that is backed up with numerous R examples. Also, the new book by Richard McElreath, Statistical Rethinking: A Bayesian Course with Examples in R and Stan looks like it is going to be an outstanding read. The online supplements to the book are certainly worth a look.

Finally, if you are a Bayesian or a thinking about becoming one and you are going to useR!, be sure to catch the following talks:

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)