Parallel JAGS RNGs

malecki

11 years ago

[This article was first published on Statistical Modeling, Causal Inference, and Social Science, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

As a matter of convention, we usually run 3 or 4 chains in JAGS. By default, this gives rise to chains that draw samples from 3 or 4 distinct pseudorandom number generators. I didn’t go and check whether it does things 111,222,333 or 123,123,123, but in any event the “parallel chains” in JAGS are samples drawn from distinct RNGs computed on a single processor core.

But we all have multiple cores now, or we’re computing on a cluster or the cloud! So the behavior we’d like from rjags is to use the foreach package with each JAGS chain using a parallel-safe RNG. The default behavior with n.chain=1 will be that each parallel instance will use .RNG.name[1], the Wichmann-Hill RNG.

JAGS 2.2.0 includes a new lecuyer module (along with the glm module, which everyone should probably always use, and doesn’t have many undocumented tricks that I know of). But lecuyer is completely undocumented! I tried .RNG.name="lecuyer::Lecuyer", .RNG.name="lecuyer::lecuyer", and .RNG.name="lecuyer::LEcuyer"
all to no avail. It ought to be .RNG.name="lecuyer::Lecuyer" to be consistent with the other .RNG.name values! I looked around in the source to find where it checks its name from the inits, to discover that in fact it is

.Rng.name="lecuyer::RngStream"

So here’s how I set up 4 chains now:

library(doMC); registerDoMC()
library(rjags); load.module("glm"); load.module("lecuyer")
library(random)
jinits <- function() {
   ### all the other params ###
  .Rng.name="lecuyer::RngStream",
  .Rng.seed=randomNumbers(n = 1, min = 1, max = 1e+06,col=1)
}
jags.parsamples <- foreach(i=1:getDoParWorkers()) %dopar% {
  model.jags <- jags.model(model, forJAGS,
                           inits=jinits,
                           n.chain=1, n.adapt=1000)
  result <- coda.samples(model.jags,params,1000)
  return(result)
}

I would just as soon initialize them to the same state and use sequential substreams, but I think there is no way to do this. Four long separately-seeded streams should be more than fine; a quick look suggests that if you did n.chain>1 (on each core) you’d get sequential substreams.

I should also probably write a better .combine so that it’s an mcmc.list and not just a list, but whatever. This works, almost 4 times (yeah yeah overhead blah blah) faster than the usual n.chain=4 would be!

To leave a comment for the author, please follow the link and comment on their blog: Statistical Modeling, Causal Inference, and Social Science.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.