MCMC is by its very nature a serial algorithm — each iteration depends on the results of the last iteration. It is, therefore, rather difficult to parallelize MCMC code so that a single chain will run more quickly by splitting the work along multiple processors.
A more straightforward way to parallelize MCMC code is to run multiple chains on multiple processors. Since the chains are independent of each other, it is mainly an issue of bookkeeping to distribute the work among several processing units.
In the next post we will see why running multiple chains can be very useful, for now we simply focus on the how. Specifically, this post explains how to exploit the multicore nature of modern computers to run several chains in roughly the same about of time as it takes to run one chain.
Running code in parallel
An easy way to run R code in parallel on a multicore system is with the
mclapply() function. Unfortunately,
mclapply() does not work on Windows machines because
mclapply() relies on forking and Windows does not support forking.
For the purposes of this online supplement, I have re-implemented
mclapply() so that it runs on Windows. My personal blog gives the details of how I implmented the fix as it is a bit out of scope for the supplement. Hopefully, by modifying
mclapply() so that it works across Windows, Mac, and Linux, we are able to focus on the statistical ideas instead of getting lost in computational details.
A cross-platform mclapply() example
The following example illustrates how
mclapply() is used. Consider the following function:
Applying it over the integers from 1 to 4 with
lapply() will take about 40 seconds:
## Run in serial system.time( serial.output
If we run it in parallel with
mclapply() on Mac or Linux, it will take about 10 seconds:
## The following script does two things: ## (i) it loads the parallel R package, and ## (ii) if the current R session is running on a Windows ## machine, it implements a custom version of ## parallel:mclapply(). source("http://mcmcinirt.stat.cmu.edu/setup/post-10-mclapply-hack.R") ## Note two changes to the wait.then.square call: ## (i) lapply becomes mclapply, and ## (ii) mc.cores (the number of processors to use in parallel) ## is set to 4. system.time( par.output
If we run it in parallel with
mclapply() on Windows, it will (i) take about 12 seconds, and (ii) produce a lot more console output:
source("http://mcmcinirt.stat.cmu.edu/setup/post-10-mclapply-hack.R") ## ## *** Microsoft Windows detected *** ## ## For technical reasons, the MS Windows version of mclapply() ## is implemented as a serial function instead of a parallel ## function. ## ## As a quick hack, we replace this serial version of mclapply() ## with a wrapper to parLapply() for this R session. Please see ## ## http://www.stat.cmu.edu/~nmv/2014/07/14/implementing-mclapply-on-windows ## ## for details. system.time( par.output
... snip ... lines represent the three additional times that the message is printed to the console by the other cores. The “following object is masked” warning can be safely ignored. In this case, it is warning you that
mclapply() has been replaced with something else, i.e. the Windows specific implementation of
Implementing a parallel sampler
Conceptually, running chains in parallel is as simple as
chain.list <- mclapply(1:num.chains, run.chain.2pl, ... )
num.chains is the total number of chains that we wish to run in parallel and
... represents the additional arguments to
The practice, however, is more complicated because each chain will need its own set of initial values, tuning parameters, and other arguments passed to
run.chain.2pl. Otherwise, we would simply be duplicating work by running the exact same chain on separate cores.
Our full implementation is as follows:
## run.chain.2pl.list ## ## A wrapper to call run.chain.2pl with mclapply() in order to ## run MCMC chains in parallel. Returns a coda::mcmc.list. ## ## Inputs: ## seq.of.random.seeds A sequence of random seeds, one for ## each chain. ## init.list mcmc.list of initial values ## MH.th.list tuning parameters for theta.abl ## MH.a.list tuning parameters for a.disc ## MH.b.list tuning parameters for b.diff ## ... the remaining arguments for run.chain.2pl run.chain.2pl.list
la.get.all() helper function extracts the relevant parameters by name from a
mcmc.list. This allows us to use the output from a previous chain to initialize a new chain, in effect, continuing where the old chain left off. The function is implemented in two steps: first, a
get.all function that operates on a single chain, and second, the definition of
la.get.all that operates on an
## get.all ## ## Helper function to extract values by parameter name get.all
To initialize a chain without the values from a prior chain we must re-create the data-structure for an
mcmc.list. The first step is to define a function
init.chain() that create the initialization structure for a single chain.
The second step is to manually create a list of these initializations, as we do in the example below.
The additional complexity to run a set of chains in parallel is mainly in specifying initial values and tuning parameters for the separate chains.
## Load the necessary code from this and previous posts source("http://mcmcinirt.stat.cmu.edu/setup/post-10.R") ## Define the hyper parameters to values from Post 1 hyperpars
The resulting set of three chains can be visualized with the same
coda functions used for the single chains. The output, however, is slightly different, the trace plots of the three chains are plotted together.
plot( run.par[,get.2pl.params(1,1,1,1)], density=FALSE )
As can be seen from the plot, the three chains have not converged to the same region of the parameter space in only 100 iterations. In the next post we examine how to formally detect when the chains have, in fact, converged to the same region of the parameter space.