Three Ways to Run Bayesian Models in R

[This article was first published on Publishable Stuff, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.


There are different ways of specifying and running Bayesian models from within R. Here I will compare three different methods, two that relies on an external program and one that only relies on R. I won’t go into much detail about the differences in syntax, the idea is more to give a gist about how the different modeling languages look and feel. The model I will be implementing assumes a normal distribution with fairly wide priors:

$$y_i \sim \text{Normal}(\mu,\sigma)$$

$$\mu \sim \text{Normal}(0,100)$$

$$\sigma \sim \text{LogNormal}(0,4)$$

Let’s start by generating some normally distributed data to use as example data in the models.

<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%">set.seed(<span style="color: #666666">1337</span>)
y <span style="color: #666666"><-</span> rnorm(n <span style="color: #666666">=</span> <span style="color: #666666">20</span>, mean <span style="color: #666666">=</span> <span style="color: #666666">10</span>, sd <span style="color: #666666">=</span> <span style="color: #666666">5</span>)
mean(y)
</pre></div>

<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%"><span style="color: #408080; font-style: italic">## [1] 12.72</span>
</pre></div>

<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%">sd(y)
</pre></div>

<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%"><span style="color: #408080; font-style: italic">## [1] 5.762</span>
</pre></div>

Method 1: JAGS

JAGS (Just Another Gibbs Sampler) is a program that accepts a model string written in an R-like syntax and that compiles and generate MCMC samples from this model using Gibbs sampling. As in BUGS, the program that inspired JAGS, the exact sampling procedure is chosen by an expert system depending on how your model looks. JAGS is conveniently called from R using the rjags package and John Myles White has written a nice introduction to using JAGS and rjags. I’ve seen it recommended to put the JAGS model specification into a separate file but I find it more convenient to put everything together in one .R file by using the textConnection function to avoid having to save the model string to a separate file. Now, the following R code specifies and runs the model above for 20000 MCMC samples:

<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%">library(rjags)

<span style="color: #408080; font-style: italic"># The model specification</span>
model_string <span style="color: #666666"><-</span> <span style="color: #BA2121">"model{</span>
<span style="color: #BA2121">  for(i in 1:length(y)) {</span>
<span style="color: #BA2121">    y[i] ~ dnorm(mu, tau)</span>
<span style="color: #BA2121">  }</span>
<span style="color: #BA2121">  mu ~ dnorm(0, 0.0001)</span>
<span style="color: #BA2121">  sigma ~ dlnorm(0, 0.0625)</span>
<span style="color: #BA2121">  tau <- 1 / pow(sigma, 2)</span>
<span style="color: #BA2121">}"</span>

<span style="color: #408080; font-style: italic"># Running the model</span>
model <span style="color: #666666"><-</span> jags.model(textConnection(model_string), data <span style="color: #666666">=</span> list(y <span style="color: #666666">=</span> y), n.chains <span style="color: #666666">=</span> <span style="color: #666666">3</span>, n.adapt<span style="color: #666666">=</span> <span style="color: #666666">10000</span>)
update(model, <span style="color: #666666">10000</span>); <span style="color: #408080; font-style: italic"># Burnin for 10000 samples</span>
mcmc_samples <span style="color: #666666"><-</span> coda.samples(model, variable.names<span style="color: #666666">=</span>c(<span style="color: #BA2121">"mu"</span>, <span style="color: #BA2121">"sigma"</span>), n.iter<span style="color: #666666">=20000</span>)
</pre></div>

One big difference between the JAGS modeling language and R is that JAGS parameterizes many distribution using precision rather than standard deviation where the precision is $\tau = 1/\sigma^2$ . Therefore the SDs of the prior distributions became $0.0001 = 1/100^2$ and $0.0625 = 1 / 4^2$ .

By using the coda.samples function to generate the samples rather than the jags.samples function it is possible to use the plotting and summary functions defined by the coda package. plot shows the trace plots and marginal densities of the two parameters…

<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%">plot(mcmc_samples)
</pre></div>

center

… and using summary we get credible intervals and point estimates for the parameters.

<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%">summary(mcmc_samples)
</pre></div>

<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%"><span style="color: #408080; font-style: italic">## </span>
<span style="color: #408080; font-style: italic">## Iterations = 20001:40000</span>
<span style="color: #408080; font-style: italic">## Thinning interval = 1 </span>
<span style="color: #408080; font-style: italic">## Number of chains = 3 </span>
<span style="color: #408080; font-style: italic">## Sample size per chain = 20000 </span>
<span style="color: #408080; font-style: italic">## </span>
<span style="color: #408080; font-style: italic">## 1. Empirical mean and standard deviation for each variable,</span>
<span style="color: #408080; font-style: italic">##    plus standard error of the mean:</span>
<span style="color: #408080; font-style: italic">## </span>
<span style="color: #408080; font-style: italic">##        Mean   SD Naive SE Time-series SE</span>
<span style="color: #408080; font-style: italic">## mu    12.72 1.37  0.00558        0.00550</span>
<span style="color: #408080; font-style: italic">## sigma  5.99 1.03  0.00421        0.00639</span>
<span style="color: #408080; font-style: italic">## </span>
<span style="color: #408080; font-style: italic">## 2. Quantiles for each variable:</span>
<span style="color: #408080; font-style: italic">## </span>
<span style="color: #408080; font-style: italic">##        2.5%   25%   50%   75% 97.5%</span>
<span style="color: #408080; font-style: italic">## mu    10.03 11.83 12.72 13.60 15.40</span>
<span style="color: #408080; font-style: italic">## sigma  4.38  5.27  5.86  6.57  8.39</span>
</pre></div>

Method 2: STAN

STAN is a fairly new program that works in a similar way to JAGS and BUGS. You write your model in STAN’s modeling language, STAN compiles your model and generates MCMC samples that you can use for further analysis in R. All this can be done from within R using the rstan package (currently not available on CRAN). One way that STAN differs from JAGS is that STAN compiles the model down to a C++ program which uses the No-U-Turn sampler to generate MCMC samples from the model. STAN’s modeling language is also a different, requiring all variables and parameters to be explicitly declared and insisting on every line ending with ;. A good introduction to STAN and rstan can be found on STAN’s wiki page. The following R code specifies the model and runs it for 20000 MCMC samples:

<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%">library(rstan)

<span style="color: #408080; font-style: italic"># The model specification</span>
model_string <span style="color: #666666"><-</span> <span style="color: #BA2121">"</span>
<span style="color: #BA2121">data {</span>
<span style="color: #BA2121">  int<lower=0> N;</span>
<span style="color: #BA2121">  real y[N];</span>
<span style="color: #BA2121">}</span>

<span style="color: #BA2121">parameters {</span>
<span style="color: #BA2121">  real mu;</span>
<span style="color: #BA2121">  real<lower=0> sigma;</span>
<span style="color: #BA2121">}</span>
<span style="color: #BA2121">model{</span>
<span style="color: #BA2121">  y ~ normal(mu, sigma);</span>
<span style="color: #BA2121">  mu ~ normal(0, 100);</span>
<span style="color: #BA2121">  sigma ~ lognormal(0, 4);</span>
<span style="color: #BA2121">}"</span>

<span style="color: #408080; font-style: italic"># Running the model</span>
mcmc_samples <span style="color: #666666"><-</span> stan(model_code<span style="color: #666666">=</span>model_string, data<span style="color: #666666">=</span>list(N<span style="color: #666666">=</span>length(y), y<span style="color: #666666">=</span>y), pars<span style="color: #666666">=</span>c(<span style="color: #BA2121">"mu"</span>, <span style="color: #BA2121">"sigma"</span>), chains<span style="color: #666666">=3</span>, iter<span style="color: #666666">=30000</span>, warmup<span style="color: #666666">=10000</span>)
</pre></div>

Another difference from JAGS, noticeable in the model definition above, is that STAN (in line with R) has vectorized functions. That is, what had to be written as…

<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%"><span style="color: #008000; font-weight: bold">for</span> (i <span style="color: #008000; font-weight: bold">in</span> y<span style="color: #666666">:</span>length(y)) {
    y[i] <span style="color: #666666">~</span> dnorm(mu, sigma)
}
</pre></div>

… in JAGS can now simplified in STAN as:

<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%">y <span style="color: #666666">~</span> normal(mu, sigma)
</pre></div>

After running the model in STAN we can now inspect the trace plots…

<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%">traceplot(mcmc_samples)
</pre></div>

center

… and the parameter distributions.

<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%">plot(mcmc_samples)
</pre></div>

center

By writing the variable holding the MCMC results in the terminal we get point estimates and credible intervals.

<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%">mcmc_samples
</pre></div>

<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%"><span style="color: #408080; font-style: italic">## Inference for Stan model: model_string.</span>
<span style="color: #408080; font-style: italic">## 3 chains, each with iter=30000; warmup=10000; thin=1; </span>
<span style="color: #408080; font-style: italic">## post-warmup draws per chain=20000, total post-warmup draws=60000.</span>
<span style="color: #408080; font-style: italic">## </span>
<span style="color: #408080; font-style: italic">##        mean se_mean  sd  2.5%   25%   50%   75% 97.5% n_eff Rhat</span>
<span style="color: #408080; font-style: italic">## mu     12.7       0 1.4  10.0  11.8  12.7  13.6  15.4 25754    1</span>
<span style="color: #408080; font-style: italic">## sigma   6.0       0 1.0   4.4   5.3   5.8   6.5   8.3 25094    1</span>
<span style="color: #408080; font-style: italic">## lp__  -45.7       0 1.0 -48.4 -46.0 -45.3 -44.9 -44.6 17964    1</span>
<span style="color: #408080; font-style: italic">## </span>
<span style="color: #408080; font-style: italic">## Samples were drawn using NUTS2 at Tue Jun 25 16:45:43 2013.</span>
<span style="color: #408080; font-style: italic">## For each parameter, n_eff is a crude measure of effective sample size,</span>
<span style="color: #408080; font-style: italic">## and Rhat is the potential scale reduction factor on split chains (at </span>
<span style="color: #408080; font-style: italic">## convergence, Rhat=1).</span>
</pre></div>

Method 3: LaplacesDemon

LaplacesDemon seems to be a rather unknown R package (I’ve found very few mentions of it on R-bloggers for example) which helps you run Bayesian models using only R. LaplacesDemon implements a plethora of different MCMC methods and has great documentation available on www.bayesian-inference.com. As opposed to JAGS and STAN there is no modeling language but instead you have to craft your own likelihood function using R. The downside of this is that your model perhaps won’t look as pretty as in JAGS but the upside is that you can be much more flexible in what constructs and probability distributions you use. Specifying and running the model using the exotic Hit-and-Run Metropolis (HARM) algorithm looks like the following:

<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%">library(LaplacesDemon)

<span style="color: #408080; font-style: italic"># The model specification</span>
model <span style="color: #666666"><-</span> <span style="color: #008000; font-weight: bold">function</span>(parm, data) {
    mu <span style="color: #666666"><-</span> parm[<span style="color: #666666">1</span>]
    sigma <span style="color: #666666"><-</span> exp(parm[<span style="color: #666666">2</span>])
    log_lik <span style="color: #666666"><-</span> sum(dnorm(data<span style="color: #666666">$</span>y, mu, sigma, log <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">T</span>))
    post_lik <span style="color: #666666"><-</span> log_lik <span style="color: #666666">+</span> dnorm(mu, <span style="color: #666666">0</span>, <span style="color: #666666">100</span>, log <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">T</span>) <span style="color: #666666">+</span> dlnorm(sigma, <span style="color: #666666">0</span>, <span style="color: #666666">4</span>, log <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">T</span>)
    <span style="color: #408080; font-style: italic"># This list is returned and has to follow a the format specified by</span>
    <span style="color: #408080; font-style: italic"># LaplacesDemon.</span>
    list(LP <span style="color: #666666">=</span> post_lik, Dev <span style="color: #666666">=</span> <span style="color: #666666">-2</span> <span style="color: #666666">*</span> log_lik, Monitor <span style="color: #666666">=</span> c(post_lik, sigma), yhat <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">NA</span>,
        parm <span style="color: #666666">=</span> parm)
}

<span style="color: #408080; font-style: italic"># Running the model</span>
data_list <span style="color: #666666"><-</span> list(N <span style="color: #666666">=</span> length(y), y <span style="color: #666666">=</span> y, mon.names <span style="color: #666666">=</span> c(<span style="color: #BA2121">"post_lik"</span>, <span style="color: #BA2121">"sigma"</span>),
    parm.names <span style="color: #666666">=</span> c(<span style="color: #BA2121">"mu"</span>, <span style="color: #BA2121">"log.sigma"</span>))
mcmc_samples <span style="color: #666666"><-</span> LaplacesDemon(Model <span style="color: #666666">=</span> model, Data <span style="color: #666666">=</span> data_list, Iterations <span style="color: #666666">=</span> <span style="color: #666666">30000</span>,
    Algorithm <span style="color: #666666">=</span> <span style="color: #BA2121">"HARM"</span>, Thinning <span style="color: #666666">=</span> <span style="color: #666666">1</span>, Specs <span style="color: #666666">=</span> list(alpha.star <span style="color: #666666">=</span> <span style="color: #666666">0.234</span>))
</pre></div>

As far as I’ve understand there is no special mechanism for keeping the parameters inside a bounded range, therefore the SD $\sigma$ is sampled on the log-scale and then exponentiated to make sure $\sigma$ is always positive.

The trace plots and distributions of MCMC samples can then be plotted…

<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%">plot(mcmc_samples, BurnIn <span style="color: #666666">=</span> <span style="color: #666666">10000</span>, data_list)
</pre></div>

centercenter

… and by Consorting with Laplace’s Demon we get a long list of info including point estimates, credible intervals and MCMC diagnostics.

<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%">Consort(mcmc_samples)
</pre></div>

<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%"><span style="color: #408080; font-style: italic">## </span>
<span style="color: #408080; font-style: italic">## #############################################################</span>
<span style="color: #408080; font-style: italic">## # Consort with Laplace's Demon                              #</span>
<span style="color: #408080; font-style: italic">## #############################################################</span>
<span style="color: #408080; font-style: italic">## Call:</span>
<span style="color: #408080; font-style: italic">## LaplacesDemon(Model = model, Data = data_list, Iterations = 30000, </span>
<span style="color: #408080; font-style: italic">##     Thinning = 1, Algorithm = "HARM", Specs = list(alpha.star = 0.234))</span>
<span style="color: #408080; font-style: italic">## </span>
<span style="color: #408080; font-style: italic">## Acceptance Rate: 0.2286</span>
<span style="color: #408080; font-style: italic">## Adaptive: 30001</span>
<span style="color: #408080; font-style: italic">## Algorithm: Hit-And-Run Metropolis</span>
<span style="color: #408080; font-style: italic">## Covariance Matrix: (NOT SHOWN HERE; diagonal shown instead)</span>
<span style="color: #408080; font-style: italic">##        mu log.sigma </span>
<span style="color: #408080; font-style: italic">##   1.86858   0.02659 </span>
<span style="color: #408080; font-style: italic">## </span>
<span style="color: #408080; font-style: italic">## Covariance (Diagonal) History: (NOT SHOWN HERE)</span>
<span style="color: #408080; font-style: italic">## Deviance Information Criterion (DIC):</span>
<span style="color: #408080; font-style: italic">##          All Stationary</span>
<span style="color: #408080; font-style: italic">## Dbar 127.887    127.887</span>
<span style="color: #408080; font-style: italic">## pD     2.133      2.133</span>
<span style="color: #408080; font-style: italic">## DIC  130.021    130.021</span>
<span style="color: #408080; font-style: italic">## </span>
<span style="color: #408080; font-style: italic">## Delayed Rejection (DR): 0</span>
<span style="color: #408080; font-style: italic">## Initial Values:</span>
<span style="color: #408080; font-style: italic">##        mu log.sigma </span>
<span style="color: #408080; font-style: italic">##    12.168     1.617 </span>
<span style="color: #408080; font-style: italic">## </span>
<span style="color: #408080; font-style: italic">## Iterations: 30000</span>
<span style="color: #408080; font-style: italic">## Log(Marginal Likelihood): -62.64</span>
<span style="color: #408080; font-style: italic">## Minutes of run-time: 0.1</span>
<span style="color: #408080; font-style: italic">## Model: (NOT SHOWN HERE)</span>
<span style="color: #408080; font-style: italic">## Monitor: (NOT SHOWN HERE)</span>
<span style="color: #408080; font-style: italic">## Parameters (Number of): 2</span>
<span style="color: #408080; font-style: italic">## Periodicity: 30001</span>
<span style="color: #408080; font-style: italic">## Posterior1: (NOT SHOWN HERE)</span>
<span style="color: #408080; font-style: italic">## Posterior2: (NOT SHOWN HERE)</span>
<span style="color: #408080; font-style: italic">## Recommended Burn-In of Thinned Samples: 0</span>
<span style="color: #408080; font-style: italic">## Recommended Burn-In of Un-thinned Samples: 0</span>
<span style="color: #408080; font-style: italic">## Recommended Thinning: 44</span>
<span style="color: #408080; font-style: italic">## Status is displayed every 1000 iterations</span>
<span style="color: #408080; font-style: italic">## Summary1: (SHOWN BELOW)</span>
<span style="color: #408080; font-style: italic">## Summary2: (SHOWN BELOW)</span>
<span style="color: #408080; font-style: italic">## Thinned Samples: 30000</span>
<span style="color: #408080; font-style: italic">## Thinning: 1</span>
<span style="color: #408080; font-style: italic">## </span>
<span style="color: #408080; font-style: italic">## </span>
<span style="color: #408080; font-style: italic">## Summary of All Samples</span>
<span style="color: #408080; font-style: italic">##              Mean     SD     MCSE    ESS      LB  Median      UB</span>
<span style="color: #408080; font-style: italic">## mu         12.612 1.3670 0.075629  472.6   9.911  12.613  15.378</span>
<span style="color: #408080; font-style: italic">## log.sigma   1.748 0.1631 0.004211 3142.5   1.447   1.738   2.085</span>
<span style="color: #408080; font-style: italic">## Deviance  127.887 2.0656 0.089109  964.0 125.834 127.213 133.392</span>
<span style="color: #408080; font-style: italic">## post_lik  -73.626 1.0716 0.046804  924.5 -76.459 -73.281 -72.557</span>
<span style="color: #408080; font-style: italic">## sigma       5.822 0.9775 0.025653 3102.8   4.250   5.688   8.041</span>
<span style="color: #408080; font-style: italic">## </span>
<span style="color: #408080; font-style: italic">## </span>
<span style="color: #408080; font-style: italic">## Summary of Stationary Samples</span>
<span style="color: #408080; font-style: italic">##              Mean     SD     MCSE    ESS      LB  Median      UB</span>
<span style="color: #408080; font-style: italic">## mu         12.612 1.3670 0.075629  472.6   9.911  12.613  15.378</span>
<span style="color: #408080; font-style: italic">## log.sigma   1.748 0.1631 0.004211 3142.5   1.447   1.738   2.085</span>
<span style="color: #408080; font-style: italic">## Deviance  127.887 2.0656 0.089109  964.0 125.834 127.213 133.392</span>
<span style="color: #408080; font-style: italic">## post_lik  -73.626 1.0716 0.046804  924.5 -76.459 -73.281 -72.557</span>
<span style="color: #408080; font-style: italic">## sigma       5.822 0.9775 0.025653 3102.8   4.250   5.688   8.041</span>
<span style="color: #408080; font-style: italic">## </span>
<span style="color: #408080; font-style: italic">## Demonic Suggestion</span>
<span style="color: #408080; font-style: italic">## </span>
<span style="color: #408080; font-style: italic">## Due to the combination of the following conditions,</span>
<span style="color: #408080; font-style: italic">## </span>
<span style="color: #408080; font-style: italic">## 1. Hit-And-Run Metropolis</span>
<span style="color: #408080; font-style: italic">## 2. The acceptance rate (0.2286) is within the interval [0.15,0.5].</span>
<span style="color: #408080; font-style: italic">## 3. Each target MCSE is < 6.27% of its marginal posterior</span>
<span style="color: #408080; font-style: italic">##    standard deviation.</span>
<span style="color: #408080; font-style: italic">## 4. Each target distribution has an effective sample size (ESS)</span>
<span style="color: #408080; font-style: italic">##    of at least 100.</span>
<span style="color: #408080; font-style: italic">## 5. Each target distribution became stationary by</span>
<span style="color: #408080; font-style: italic">##    1 iteration.</span>
<span style="color: #408080; font-style: italic">## </span>
<span style="color: #408080; font-style: italic">## Laplace's Demon has been appeased, and suggests</span>
<span style="color: #408080; font-style: italic">## the marginal posterior samples should be plotted</span>
<span style="color: #408080; font-style: italic">## and subjected to any other MCMC diagnostic deemed</span>
<span style="color: #408080; font-style: italic">## fit before using these samples for inference.</span>
<span style="color: #408080; font-style: italic">## </span>
<span style="color: #408080; font-style: italic">## Laplace's Demon is finished consorting.</span>
</pre></div>

Comparison of the Three Methods

Concluding this little exercise I thought I’d point out what I feel are some pros and cons of the three methods.

JAGS
+ Mature program, lots of documentation and examples (for example, the great BUGS book).
+ Nice modeling language, it is almost possible to enter the likelihood specification verbatim.
+ Takes care of the sampling for you (in the best case).
– It is difficult to use probability distributions not included in JAGS.
– You’re stuck with Gibbs sampling for better and for worse.
STAN
+ An expressive modeling language similar to JAGS.
+ An active and supportive community (Especially the mailing lists).
+ Compiling down to C++ should result in efficient sampling.
+ The No-U-Turn sampler requires no configuration.
– A fairly new program (but improving at a rapid pace).
– The modeling language requires a bit more bookkeeping than JAGS’ (for example, why all the `;` ?).
– Compiling down to C++ takes some time (> 30 s on my Dell Vostro 3700)
– It is difficult to use probability distributions not included in STAN (but easier than in JAGS)
LaplacesDemon
+ Relies only on R.
+ Easy to use any probability distribution implemented in R.
+ Good documentation.
+ Lots of different MCMC methods to try.
– No fancy modeling language, the likelihood has to be specified using an R function.
– No large user community.
– You have to select and configure the MCMC sampling yourself.
– As everything is implemented in R the sampling might be slower that in JAGS and STAN.

To leave a comment for the author, please follow the link and comment on their blog: Publishable Stuff.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)