Performance with foreach, doSNOW, and snowfall

[This article was first published on Adventures in Statistical Computing, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Is it just me, or does the performance of the foreach package with a doSNOW backend operating on a socket grid suck?

Here at work, I am helping to setup a cluster of Windows machines for distributed R processing.  We have lots of researchers running code that takes hours to complete and are essentially large for loops with lots of analysis in between.  These guys and gals are not hard core programmers, so there is lots of interest in foreach (as opposed to something like RMPI).

I have successfully setup a POC grid between mutliple machines using sockets and public key authentication.  Assuming we use this, I’ll post a how-to, as there is not much on the web on how to get it working on Windows.

In the meantime, I am testing performance.  There is something going on with foreach that I do not understand.  Performance numbers are really bad.

Can anyone explain what is going on here?
> require(doSNOW)
Loading required package: doSNOW
Loading required package: foreach
foreach: simple, scalable parallel programming from Revolution Analytics
Use Revolution R for scalability, fault tolerance and more.
Loading required package: iterators
Loading required package: snow
> require(snowfall)
Loading required package: snowfall
> sfInit(parallel=TRUE,socketHosts=rep(“localhost”,3))
R Version:  R version 2.15.0 (2012-03-30)
snowfall 1.84 initialized (using snow 0.3-9): parallel execution on 3 CPUs.
> cl = sfGetCluster()
> f = function(x) {
+    sum = 0
+    for (i in seq(1,x)) sum = sum + i
+    return(sum)
+ }
> registerDoSNOW(cl)
> out = vector(“logical”,length=10000)
> system.time( (for (i in seq(1,10000)) out[i]=f(i) ))
   user  system elapsed
  25.99    0.00   25.99
> system.time( (out = lapply(seq(1,10000),f) ))
   user  system elapsed
  26.55    0.00   26.55
> system.time( (out = parLapply(cl,seq(1,10000),f) ))
   user  system elapsed
   0.02    0.00   15.85
> system.time( (out = foreach(i=seq(1,10000)) %dopar% f(i) ))
   user  system elapsed
   6.64    0.42   98.31
> getDoParWorkers()
[1] 3
EDIT: HA!  Figured it out.  foreach is not very efficient in communicating tasks as compared to par*apply().  The time to communicate the process overwhelmed the actual processing time.

When I change the code to this, it runs fast (about the same as parLapply()):

> system.time( (out = foreach(i=seq(0,9),.combine=’c’) %dopar% {
+    apply(as.array(seq(i*1000+1,(i+1)*1000)),1,f)
+ }))
   user  system elapsed
   0.00    0.00   14.03

To leave a comment for the author, please follow the link and comment on their blog: Adventures in Statistical Computing. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)