# MCMC and faster Gibbs Sampling using Rcpp

July 14, 2011
By

(This article was first published on Thinking inside the box , and kindly contributed to R-bloggers)

Sanjog Misra, who
uses Rcpp for
Monte Carlo Markov Chain (MCMC) analyses in quantitative marketing, kindly set me a
short example of Rcpp use.

The example is based on a
blog post by Darren Wilkinson
which itself discusses and compares the suitability of R, Python, Java or C for MCMC analysis, using the
Gibbs sampler as a concrete example. Darren’s post is worth checking out: he
stresses the rather pragmatic aspects of how fast and/or easy it is to write the code, rather than just the mere runtime. As such,
he is not too concerned with a speed advantage of Python over R which he sees at a factor of around 2.4, leaving him to continue to prototype in
R. Similarly, with C ‘only’ being faster than Java by a factor of two, he prefers Java for the numerically more demanding parts.

We do of course advocate the use of Rcpp to combine the best aspects of R and C++,
respectively. This Gibbs sampler example provides a nice backdrop. So working with Darren’s example, consider the same Gibbs sampler for
a bivariate distribution (and apologies for the lack of latex typesetting on my blog)

```f(x,y) = k x^2 exp( -x y^2 - y^2 + 2y - 4x)
```

where the conditional distributions are

```f(x|y) = (x^2)*exp(-x*(4+y*y))               ## a Gamma density kernel
f(y|x) = exp(-0.5*2*(x+1)*(y^2 - 2*y/(x+1))  ## a Gaussian kernel
```

Sanjog then spotted and corrected a small error in the variance expression in Darren’s derivation; this is now acknowledged on Darren’s
website. Full details are in the R script now committed in SVN.

The R code for the Gibbs sample can therefore be
written as follows below. Note that this uses thinning to minimize serial correlation in the conditional densities–which renders
the computation more demanding as ‘N times thin’ draws have to be generated:

```## Here is the actual Gibbs Sampler
## This is Darren Wilkinsons R code (with the corrected variance)
## But we are returning only his columns 2 and 3 as the 1:N sequence
## is never used below
Rgibbs <- function(N,thin) {
mat <- matrix(0,ncol=2,nrow=N)
x <- 0
y <- 0
for (i in 1:N) {
for (j in 1:thin) {
x <- rgamma(1,3,y*y+4)
y <- rnorm(1,1/(x+1),1/sqrt(2*(x+1)))
}
mat[i,] <- c(x,y)
}
mat
}
```

A second variant can be computed using the R bytecode compiler which appeared with the recent release of R 2.13.0 (and which we analysed
in this blog post from April). This is as easy as

```## We can also try the R compiler on this R function
RCgibbs <- cmpfun(Rgibbs)
```

Thanks to Rcpp and the inline package, we can also write a C++ variant that can be
built and launched from R with ease. The C++ code is assigned to an R text variable `gibbscode`. (And we used to typeset such
code on the blog as a charater string, ie in a faint red color—but have now switched to highlight it as if it were freestanding C++
code. It really is passed as a single string to R which then uses the `cxxfunction()` to compile, link and load a C++ function
built around the code. See previous posts on the inline package for more.)

```## Now for the Rcpp version -- Notice how easy it is to code up!
gibbscode <- '

using namespace Rcpp;   // inline does that for us already

// n and thin are SEXPs which the Rcpp::as function maps to C++ vars
int N   = as<int>(n);
int thn = as<int>(thin);

int i,j;
NumericMatrix mat(N, 2);

RNGScope scope;         // Initialize Random number generator

// The rest of the code follows the R version
double x=0, y=0;

for (i=0; i<N; i++) {
for (j=0; j<thn; j++) {
x = ::Rf_rgamma(3.0,1.0/(y*y+4));
y = ::Rf_rnorm(1.0/(x+1),1.0/sqrt(2*x+2));
}
mat(i,0) = x;
mat(i,1) = y;
}

'
RcppGibbs <- cxxfunction(signature(n="int", thin = "int"),
gibbscode, plugin="Rcpp")

```

It is noteworthy how the code logic is essentially identical between the basic R version, and the C++ version. Two nested loops control draws
of x from a Gamma distribution, conditional on y, as well as draws of y from a Normal, conditional on x. Add a small amount of parameter passing
to obtain the parameters `N` and `thin`, allocation of a results matrix and setup of the random number generator state
to remain consistent with R, as well as a return of the matrix—and that is all.

As Darren’s code uses the GNU GSL in its C variant, I also became interested in seeing how a C/C++ hybrid variant using our
RcppGSL package would fare. The code is below.

```gslgibbsincl <- '
#include <gsl/gsl_rng.h>
#include <gsl/gsl_randist.h>

using namespace Rcpp;  // just to be explicit
'

gslgibbscode <- '
int N = as<int>(ns);
int thin = as<int>(thns);
int i, j;
gsl_rng *r = gsl_rng_alloc(gsl_rng_mt19937);
double x=0, y=0;
NumericMatrix mat(N, 2);
for (i=0; i<N; i++) {
for (j=0; j<thin; j++) {
x = gsl_ran_gamma(r,3.0,1.0/(y*y+4));
y = 1.0/(x+1)+gsl_ran_gaussian(r,1.0/sqrt(2*x+2));
}
mat(i,0) = x;
mat(i,1) = y;
}
gsl_rng_free(r);

'

GSLGibbs <- cxxfunction(signature(ns="int", thns = "int"),
body=gslgibbscode, includes=gslgibbsincl,
plugin="RcppGSL")
```

This code is similar to the version using just Rcpp, but we need to allocate space
for the GSL random generator object (and later release that memory allocation), and we do of course call the GSL functions. This also
necessitates declaring the function using the two header files listed as arguments for the `include` argument of
`cxxfunction()`.

So how is the performance?

The script (now committed in Rcpp‘s SVN repo as
`examples/RcppGibbs/RcppGibbs.R`, and also part of the next release) creates the output below when running in non-interactive
mode (whereas interactive mode creates a few charts too):

```[email protected]/* <![CDATA[ */!function(t,e,r,n,c,a,p){try{t=document.currentScript||function(){for(t=document.getElementsByTagName('script'),e=t.length;e--;)if(t[e].getAttribute('data-cfhash'))return t[e]}();if(t&&(c=t.previousSibling)){p=t.parentNode;if(a=c.getAttribute('data-cfemail')){for(e='',r='0x'+a.substr(0,2)|0,n=2;a.length-n;n+=2)e+='%'+('0'+('0x'+a.substr(n,2)^r).toString(16)).slice(-2);p.replaceChild(document.createTextNode(decodeURIComponent(e)),c)}p.removeChild(t)}}catch(u){}}()/* ]]> */:~/svn/rcpp/pkg/Rcpp/inst/examples/RcppGibbs\$ r RcppGibbs.R
Replication # 1 complete
Replication # 2 complete
Replication # 3 complete
Replication # 4 complete
N=1000 N=5000 N=10000 N=20000
Elasped Time (R)     0.124  2.650  10.511  41.773
Elasped Time (RC)    0.081  1.916   7.594  30.280
Elapsed Time (Rcpp)  0.003  0.076   0.306   1.221
Elapsed Time (Rgsl)  0.002  0.049   0.196   0.784
SpeedUp Rcomp.       1.530  1.380   1.380   1.380
SpeedUp Rcpp        41.330 34.870  34.350  34.210
SpeedUp GSL         62.000 54.080  53.630  53.280

test replications elapsed  relative user.self sys.self
4  GSLGibbs(N, thn)           10   7.845  1.000000      7.84        0
3 RcppGibbs(N, thn)           10  12.218  1.557425     12.22        0
2   RCgibbs(N, thn)           10 312.296 39.808286    311.98        0
1    Rgibbs(N, thn)           10 420.953 53.658764    420.59        0
```

The first block is a hand-computed comparison using four sets of parameters; the second block uses the excellent rbenchmark package to
compare ten runs at twenty-thousand draws.

We see a nominal increase in performance due to the bytecode compiler, saving roughly 38 percent which seems appropriate given that the R
code mostly controls the loops; actual work in undertaken in the already compiled RNG functions. The Rcpp variant is about 34 times faster
than the pure R code. Sanjog reported higher increases on his OS X machines (and Darren’s post echos a similar order of
magnitude). However, on my i7 running Linux I always obtained an improvement in the mid- to high thirties. That is certainly already a
rather pleasant result.

What surprised and stunned me at first was that the GSL solution scores an improvement of around 53 times (close to the factor of 60
reported by Darren). A closer look at the code (shown above) makes it clear that there are very few compute-intensive operations outside of
the RNG draws. I validated this further with a second study timing just one million draws each from a Gaussian and Gamma using R via Rcpp,
and using the GSL. It turns out that the R code is about 2.5 times slower for random draws from the Gamma distribution than the GSL. Inspection of the
source code—in files `src/nmath/rgamma.c` for R and `randist/gamma.c` for the GSL shows why. R uses a much more complex
(and presumably more precise) algorithm. I may follow up with Martin Maechler to see if we can illustrative the numerial benefits of this
more expensive approach—this blog entry is getting long enough already.

So to sum up: Gibbs sampling is a somewhat resource-heavy Monte Carlo Markov Chain method for investigating multivariate distributions. R excels at
quick and simple explorations, albeit at somewhat slower execution speed. The
Rcpp package can help by providing easy
means to accelerate simulations by a significant factor. The example discussed here is now in SVN repository for
Rcpp and will be part of the next release.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...