(This article was first published on

Is it just me, or does the performance of the foreach package with a doSNOW backend operating on a socket grid suck?**Adventures in Statistical Computing**, and kindly contributed to R-bloggers)Here at work, I am helping to setup a cluster of Windows machines for distributed R processing. We have lots of researchers running code that takes hours to complete and are essentially large for loops with lots of analysis in between. These guys and gals are not hard core programmers, so there is lots of interest in foreach (as opposed to something like RMPI).

I have successfully setup a POC grid between mutliple machines using sockets and public key authentication. Assuming we use this, I'll post a how-to, as there is not much on the web on how to get it working on Windows.

In the meantime, I am testing performance. There is something going on with foreach that I do not understand. Performance numbers are really bad.

Can anyone explain what is going on here?

> require(doSNOW)

Loading required package: doSNOW

Loading required package: foreach

foreach: simple, scalable parallel programming from Revolution Analytics

Use Revolution R for scalability, fault tolerance and more.

http://www.revolutionanalytics.com

Loading required package: iterators

Loading required package: snow

> require(snowfall)

Loading required package: snowfall

>

> sfInit(parallel=TRUE,socketHosts=rep("localhost",3))

R Version: R version 2.15.0 (2012-03-30)

snowfall 1.84 initialized (using snow 0.3-9): parallel execution on 3 CPUs.

> cl = sfGetCluster()

>

> f = function(x) {

+ sum = 0

+ for (i in seq(1,x)) sum = sum + i

+ return(sum)

+ }

>

> registerDoSNOW(cl)

>

> out = vector("logical",length=10000)

> system.time( (for (i in seq(1,10000)) out[i]=f(i) ))

user system elapsed

25.99 0.00 25.99

>

> system.time( (out = lapply(seq(1,10000),f) ))

user system elapsed

26.55 0.00 26.55

>

> system.time( (out = parLapply(cl,seq(1,10000),f) ))

user system elapsed

0.02 0.00 15.85

>

> system.time( (out = foreach(i=seq(1,10000)) %dopar% f(i) ))

user system elapsed

6.64 0.42 98.31

>

> getDoParWorkers()

[1] 3

**EDIT:**HA! Figured it out. foreach is not very efficient in communicating tasks as compared to par*apply(). The time to communicate the process overwhelmed the actual processing time.

When I change the code to this, it runs fast (about the same as parLapply()):

> system.time( (out = foreach(i=seq(0,9),.combine='c') %dopar% {

+ apply(as.array(seq(i*1000+1,(i+1)*1000)),1,f)

+ }))

user system elapsed

0.00 0.00 14.03

To

**leave a comment**for the author, please follow the link and comment on his blog:**Adventures in Statistical Computing**.R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...