Parallel JAGS RNGs

July 23, 2011
By

(This article was first published on Statistical Modeling, Causal Inference, and Social Science, and kindly contributed to R-bloggers)

As a matter of convention, we usually run 3 or 4 chains in JAGS. By default, this gives rise to chains that draw samples from 3 or 4 distinct pseudorandom number generators. I didn’t go and check whether it does things 111,222,333 or 123,123,123, but in any event the “parallel chains” in JAGS are samples drawn from distinct RNGs computed on a single processor core.

But we all have multiple cores now, or we’re computing on a cluster or the cloud! So the behavior we’d like from rjags is to use the foreach package with each JAGS chain using a parallel-safe RNG. The default behavior with n.chain=1 will be that each parallel instance will use .RNG.name[1], the Wichmann-Hill RNG.

JAGS 2.2.0 includes a new lecuyer module (along with the glm module, which everyone should probably always use, and doesn’t have many undocumented tricks that I know of). But lecuyer is completely undocumented! I tried .RNG.name="lecuyer::Lecuyer", .RNG.name="lecuyer::lecuyer", and .RNG.name="lecuyer::LEcuyer"
all to no avail. It ought to be .RNG.name="lecuyer::Lecuyer" to be consistent with the other .RNG.name values! I looked around in the source to find where it checks its name from the inits, to discover that in fact it is

.Rng.name="lecuyer::RngStream"

So here’s how I set up 4 chains now:

library(doMC); registerDoMC()
library(rjags); load.module("glm"); load.module("lecuyer")
library(random)
jinits <- function() {
   ### all the other params ###
  .Rng.name="lecuyer::RngStream",
  .Rng.seed=randomNumbers(n = 1, min = 1, max = 1e+06,col=1)
}
jags.parsamples <- foreach(i=1:getDoParWorkers()) %dopar% {
  model.jags <- jags.model(model, forJAGS,
                           inits=jinits,
                           n.chain=1, n.adapt=1000)
  result <- coda.samples(model.jags,params,1000)
  return(result)
}

I would just as soon initialize them to the same state and use sequential substreams, but I think there is no way to do this. Four long separately-seeded streams should be more than fine; a quick look suggests that if you did n.chain>1 (on each core) you’d get sequential substreams.

I should also probably write a better .combine so that it’s an mcmc.list and not just a list, but whatever. This works, almost 4 times (yeah yeah overhead blah blah) faster than the usual n.chain=4 would be!

To leave a comment for the author, please follow the link and comment on his blog: Statistical Modeling, Causal Inference, and Social Science.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , , ,

Comments are closed.

Top 3 Posts from the past 2 days

Top 9 articles of the week

  1. Installing R packages
  2. Public Universities Should Use Open Source Software
  3. R 3.1.0 is released!
  4. Using apply, sapply, lapply in R
  5. Basics of Histograms
  6. Regular expressions in R vs RStudio
  7. Box-plot with R – Tutorial
  8. Quantitative Finance Applications in R - 5: an Introduction to Monte Carlo Simulation
  9. Visualize violent crime rates in US with choroplethr package

Sponsors