Parallel processing in R for Windows

March 4, 2011
By

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

The doSMP package (and its companion package, revoIPC), previously bundled only with Revolution R, is now available on CRAN for use with open-source R under the GPL2 license.

In short, doSMP makes it easy to do SMP parallel processing on a Windows box with multiple processors. (It works on Mac and Linux too, but it's been relatively easy to do parallel processing on those systems for a while with doMC/multicore package combo. Windows, not so much.) Basically, you tell it how many processors you have, write a loop using the foreach function, and the iterations of the loop run in parallel, using multiple processors. For embarassingly parallel problems like simulations and optimizations and such, if you have 2 processors you can get close to halving the processing time; reduce it to near 25% with 4 processors, and so on. (Whether these are true, independent CPUs or cores within a processor matters a little, but not much.)

You can see some examples in the doSMP vignette, from which I adapted the following example. Suppose you want to bootstrap parameter estimates from a logistic regression using 1000 samples:

x <- iris[which(iris[, 5] != "setosa"), c(1, 5)]
trials <- 10000
chunkSize <- ceiling(trials/getDoParWorkers())
smpopts <- list(chunkSize = chunkSize)
r <- foreach(icount(trials), .combine = cbind, .options.smp = smpopts)
  %dopar% {
  ind <- sample(100, 100, replace = TRUE)
  result1 <- glm(x[ind, 2] ~ x[ind, 1], family = binomial(logit))
  coefficients(result1)
}

Created by Pretty R at inside-R.org

Note the use of foreach to run the bootstrap models in parallel. On a 4-core machine, you could reduce processing time from 104 seconds to 57 seconds compared to using a regular for loop. Not quite a fourfold reduction, but a significant reduction in time nonetheless. (Tip: if you're using Revolution R, you might want to try turning off MKL multithreading when using doSMP/foreach, to avoid contention between the small-grain threading of MKL, and the large-grain parallelism of foreach.)

I've written about foreach several times before (here, here and here for example) using other parallel backends like doMC and doSNOW. Now you can use those same examples on Windows with open-source R and the doSMP package.

doSMP package: Getting Started with doSMP and foreach

To leave a comment for the author, please follow the link and comment on his blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , ,

Comments are closed.