Pegging your multicore CPU in Revolution R, Good and Bad

June 9, 2010
By

(This article was first published on Nathan VanHoudnos » rstats, and kindly contributed to R-bloggers)

Seven of eight cores at maximum usage

I take an almost unhealthy pleasure in pushing my computer to its limits. This has become easier with Revolution R and its free license for academic use. One of its best features is debugger that allows you to step through R code interactively like you can with python on PyDev. The other useful thing it packages is a simple way to run embarrassingly parallel jobs on a multicore box with the doSMP package.


library(doSMP)

# This declares how many processors to use.
# Since I still wanted to use my laptop, during the simulation I chose cores-1.
workers <- startWorkers(7)
registerDoSMP(workers)

# Make Revolution R not try to go multi-core since we're already explicitly running in parallel
# Tip from: http://blog.revolutionanalytics.com/2010/06/performance-benefits-of-multithreaded-r.html
setMKLthreads(1)

chunkSize <- ceiling(runs / getDoParWorkers())
smpopts <- list(chunkSize=chunkSize)

#This just let's me see how long the simulation ran
beginTime <- Sys.time()

#This is the crucial piece. It parallelizes a for loop among the workers and aggregates their results
#with cbind. Since my function returns c(result1, result2, result3), r becomes a matrix with 3 rows and
# "runs" columns.
r <- foreach(icount(runs), .combine=cbind, .options.smp=smpopts) %dopar% {
# repeatExperiment is just a wrapper function that returns a c(result1, result2, result3)
tmp <- repeatExperiment(N,ratingsPerQuestion, minRatings, trials, cutoff, studentScores)
}

runTime <- Sys.time() - beginTime

#So now I can do something like this:
boxplot(r[1,], r[2,], r[3,],
main=paste("Distribution of Percent of rmse below ", cutoff,
"n Runs=", runs, " Trials=",trials, " Time=",round(runTime,2)," minsn",
"scale: ",scaleLow,"-",scaleHigh,
sep=""),
names=c("Ave3","Ave5","Ave7"))

If you are intersested in finding out more of about this, their docs are pretty good.

The only drawback is that Revolution R is a bit rough around the edges and crashes much more than it should. Worse, for me at least the support forum doesn’t show any posts when I’m logged in and I can’t post anything. Although I’ve filled out (what I think is) the appropriate web-form no one has gotten back to me about fixing my account. I’m going to try twitter in a bit. Your mileage may vary.

Update: 6/9/2010 22:03 EST

Revolution Analytics responded to my support request after I mentioned it on twitter. Apparently, they had done something to the forums which corrupted my account. Creating a new account fixed the problem, so now I can report the bugs that I
find and get some help.

Update: 6/11/2010 16:03 EST

It turns out that you get a small speed improvement by setting setMKLthreads(1). Apparently, the libraries Revolution R links against attempt to use multiple cores by default. If you are explicitly parrallel programing, this means that your code is competing with itself for resources. Thanks for the tip!

To leave a comment for the author, please follow the link and comment on his blog: Nathan VanHoudnos » rstats.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.