Parallel JAGS RNGs

July 23, 2011
By

(This article was first published on Statistical Modeling, Causal Inference, and Social Science, and kindly contributed to R-bloggers)

As a matter of convention, we usually run 3 or 4 chains in JAGS. By default, this gives rise to chains that draw samples from 3 or 4 distinct pseudorandom number generators. I didn’t go and check whether it does things 111,222,333 or 123,123,123, but in any event the “parallel chains” in JAGS are samples drawn from distinct RNGs computed on a single processor core.

But we all have multiple cores now, or we’re computing on a cluster or the cloud! So the behavior we’d like from rjags is to use the foreach package with each JAGS chain using a parallel-safe RNG. The default behavior with n.chain=1 will be that each parallel instance will use .RNG.name[1], the Wichmann-Hill RNG.

JAGS 2.2.0 includes a new lecuyer module (along with the glm module, which everyone should probably always use, and doesn’t have many undocumented tricks that I know of). But lecuyer is completely undocumented! I tried .RNG.name="lecuyer::Lecuyer", .RNG.name="lecuyer::lecuyer", and .RNG.name="lecuyer::LEcuyer"
all to no avail. It ought to be .RNG.name="lecuyer::Lecuyer" to be consistent with the other .RNG.name values! I looked around in the source to find where it checks its name from the inits, to discover that in fact it is

.Rng.name="lecuyer::RngStream"

So here’s how I set up 4 chains now:

library(doMC); registerDoMC()
library(rjags); load.module("glm"); load.module("lecuyer")
library(random)
jinits <- function() {
   ### all the other params ###
  .Rng.name="lecuyer::RngStream",
  .Rng.seed=randomNumbers(n = 1, min = 1, max = 1e+06,col=1)
}
jags.parsamples <- foreach(i=1:getDoParWorkers()) %dopar% {
  model.jags <- jags.model(model, forJAGS,
                           inits=jinits,
                           n.chain=1, n.adapt=1000)
  result <- coda.samples(model.jags,params,1000)
  return(result)
}

I would just as soon initialize them to the same state and use sequential substreams, but I think there is no way to do this. Four long separately-seeded streams should be more than fine; a quick look suggests that if you did n.chain>1 (on each core) you’d get sequential substreams.

I should also probably write a better .combine so that it’s an mcmc.list and not just a list, but whatever. This works, almost 4 times (yeah yeah overhead blah blah) faster than the usual n.chain=4 would be!

To leave a comment for the author, please follow the link and comment on his blog: Statistical Modeling, Causal Inference, and Social Science.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , , ,

Comments are closed.

Top 3 Posts from the past 2 days

Top 9 articles of the week

  1. Prediction model for the FIFA World Cup 2014
  2. Who will win the World Cup and which prediction model?
  3. Using apply, sapply, lapply in R
  4. Installing R packages
  5. New Version of RStudio: R Markdown v2 and More
  6. A suggestion to Windows-based users of R: It may be time to relocate
  7. Quickly export multiple R objects to an Excel Workbook
  8. R Markdown v2
  9. Basics of Histograms

Sponsors