[This article was first published on Markov Chain Monte Carlo in Item Response Models, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

MCMC is by its very nature a serial algorithm — each iteration depends on the results of the last iteration. It is, therefore, rather difficult to parallelize MCMC code so that a single chain will run more quickly by splitting the work along multiple processors.

A more straightforward way to parallelize MCMC code is to run multiple chains on multiple processors. Since the chains are independent of each other, it is mainly an issue of bookkeeping to distribute the work among several processing units.

In the next post we will see why running multiple chains can be very useful, for now we simply focus on the how. Specifically, this post explains how to exploit the multicore nature of modern computers to run several chains in roughly the same about of time as it takes to run one chain.

# Running code in parallel

An easy way to run R code in parallel on a multicore system is with the mclapply() function. Unfortunately, mclapply() does not work on Windows machines because mclapply() relies on forking and Windows does not support forking.

For the purposes of this online supplement, I have re-implemented mclapply() so that it runs on Windows. My personal blog gives the details of how I implmented the fix as it is a bit out of scope for the supplement. Hopefully, by modifying mclapply() so that it works across Windows, Mac, and Linux, we are able to focus on the statistical ideas instead of getting lost in computational details.

## A cross-platform mclapply() example

The following example illustrates how mclapply() is used. Consider the following function:

wait.then.square

Applying it over the integers from 1 to 4 with lapply() will take about 40 seconds:

## Run in serial
system.time( serial.output 

If we run it in parallel with mclapply() on Mac or Linux, it will take about 10 seconds:

## The following script does two things:
##   (i)  it loads the parallel R package, and
##   (ii) if the current R session is running on a Windows
##        machine, it implements a custom version of
##        parallel:mclapply().
source("http://mcmcinirt.stat.cmu.edu/setup/post-10-mclapply-hack.R")

## Note two changes to the wait.then.square call:
##   (i)   lapply becomes mclapply, and
##   (ii)  mc.cores (the number of processors to use in parallel)
##         is set to 4.
system.time( par.output 

If we run it in parallel with mclapply() on Windows, it will (i) take about 12 seconds, and (ii) produce a lot more console output:

source("http://mcmcinirt.stat.cmu.edu/setup/post-10-mclapply-hack.R")
##
##  *** Microsoft Windows detected ***
##
##  For technical reasons, the MS Windows version of mclapply()
##  is implemented as a serial function instead of a parallel
##  function.
##
##  As a quick hack, we replace this serial version of mclapply()
##  with a wrapper to parLapply() for this R session. Please see
##
##    http://www.stat.cmu.edu/~nmv/2014/07/14/implementing-mclapply-on-windows
##
##  for details.

system.time( par.output 

where the ... snip ... lines represent the three additional times that the message is printed to the console by the other cores. The “following object is masked” warning can be safely ignored. In this case, it is warning you that mclapply() has been replaced with something else, i.e. the Windows specific implementation of mclapply()!

# Implementing a parallel sampler

Conceptually, running chains in parallel is as simple as

    chain.list <- mclapply(1:num.chains, run.chain.2pl, ... )


where num.chains is the total number of chains that we wish to run in parallel and ... represents the additional arguments to run.chain.2pl.

The practice, however, is more complicated because each chain will need its own set of initial values, tuning parameters, and other arguments passed to run.chain.2pl. Otherwise, we would simply be duplicating work by running the exact same chain on separate cores.

Our full implementation is as follows:

## run.chain.2pl.list
##
## A wrapper to call run.chain.2pl with mclapply() in order to
## run MCMC chains in parallel. Returns a coda::mcmc.list.
##
## Inputs:
##   seq.of.random.seeds   A sequence of random seeds, one for
##                         each chain.
##   init.list             mcmc.list of initial values
##   MH.th.list            tuning parameters for theta.abl
##   MH.a.list             tuning parameters for a.disc
##   MH.b.list             tuning parameters for b.diff
##   ...                   the remaining arguments for run.chain.2pl
run.chain.2pl.list 

The la.get.all() helper function extracts the relevant parameters by name from a coda mcmc.list. This allows us to use the output from a previous chain to initialize a new chain, in effect, continuing where the old chain left off. The function is implemented in two steps: first, a get.all function that operates on a single chain, and second, the definition of la.get.all that operates on an mcmc.list:

## get.all
##
## Helper function to extract values by parameter name
get.all 

To initialize a chain without the values from a prior chain we must re-create the data-structure for an mcmc.list. The first step is to define a function init.chain() that create the initialization structure for a single chain.

init.chain

The second step is to manually create a list of these initializations, as we do in the example below.

# An example

The additional complexity to run a set of chains in parallel is mainly in specifying initial values and tuning parameters for the separate chains.

## Load the necessary code from this and previous posts
source("http://mcmcinirt.stat.cmu.edu/setup/post-10.R")

## Define the hyper parameters to values from Post 1
hyperpars 

The resulting set of three chains can be visualized with the same coda functions used for the single chains. The output, however, is slightly different, the trace plots of the three chains are plotted together.

 plot( run.par[,get.2pl.params(1,1,1,1)], density=FALSE )

As can be seen from the plot, the three chains have not converged to the same region of the parameter space in only 100 iterations. In the next post we examine how to formally detect when the chains have, in fact, converged to the same region of the parameter space.