June 9, 2010
By

(This article was first published on Nathan VanHoudnos » rstats, and kindly contributed to R-bloggers)

I take an almost unhealthy pleasure in pushing my computer to its limits. This has become easier with Revolution R and its free license for academic use. One of its best features is debugger that allows you to step through R code interactively like you can with python on PyDev. The other useful thing it packages is a simple way to run embarrassingly parallel jobs on a multicore box with the doSMP package.

 library(doSMP)

 # This declares how many processors to use. # Since I still wanted to use my laptop, during the simulation I chose cores-1. workers <- startWorkers(7) registerDoSMP(workers) # Make Revolution R not try to go multi-core since we're already explicitly running in parallel # Tip from: http://blog.revolutionanalytics.com/2010/06/performance-benefits-of-multithreaded-r.html setMKLthreads(1) chunkSize <- ceiling(runs / getDoParWorkers()) smpopts <- list(chunkSize=chunkSize) #This just let's me see how long the simulation ran beginTime <- Sys.time() #This is the crucial piece. It parallelizes a for loop among the workers and aggregates their results #with cbind. Since my function returns c(result1, result2, result3), r becomes a matrix with 3 rows and # "runs" columns. r <- foreach(icount(runs), .combine=cbind, .options.smp=smpopts) %dopar% { # repeatExperiment is just a wrapper function that returns a c(result1, result2, result3) tmp <- repeatExperiment(N,ratingsPerQuestion, minRatings, trials, cutoff, studentScores) } runTime <- Sys.time() - beginTime 

#So now I can do something like this: boxplot(r[1,], r[2,], r[3,], main=paste("Distribution of Percent of rmse below ", cutoff, "n Runs=", runs, " Trials=",trials, " Time=",round(runTime,2)," minsn", "scale: ",scaleLow,"-",scaleHigh, sep=""), names=c("Ave3","Ave5","Ave7")) 

The only drawback is that Revolution R is a bit rough around the edges and crashes much more than it should. Worse, for me at least the support forum doesn’t show any posts when I’m logged in and I can’t post anything. Although I’ve filled out (what I think is) the appropriate web-form no one has gotten back to me about fixing my account. I’m going to try twitter in a bit. Your mileage may vary.

#### Update: 6/9/2010 22:03 EST

Revolution Analytics responded to my support request after I mentioned it on twitter. Apparently, they had done something to the forums which corrupted my account. Creating a new account fixed the problem, so now I can report the bugs that I
find and get some help.

#### Update: 6/11/2010 16:03 EST

It turns out that you get a small speed improvement by setting setMKLthreads(1). Apparently, the libraries Revolution R links against attempt to use multiple cores by default. If you are explicitly parrallel programing, this means that your code is competing with itself for resources. Thanks for the tip!