R: parallel processing using multicore package

April 14, 2010

(This article was first published on compBiomeBlog, and kindly contributed to R-bloggers)

I have been meaning to look at adding some parallel processing to R as I have some scripts that are painfully slow and embarrassingly parallel. There seem to be a lot of packages around for doing parallel computing, listed here.

I decided to look at multicore as it seemed easy to implement. The core of the package is the mclapply function, which is the multi core version of lapply. Basically you install the package,


load the library,


then replace any instances of lapply in your code with mclapply it will speed up your code! Easy.

Obviously there are more complications than this and there are various options you can use, such as the number of cores to use etc.

To give a quick test:

test <- lapply(1:10,function(x) rnorm(10000))
system.time(x <- lapply(test,function(x) loess.smooth(x,x)))
#   user  system elapsed
#  0.954   0.246   2.795
system.time(x <- mclapply(test,function(x) loess.smooth(x,x)))
#   user  system elapsed
#  0.896   0.898   0.914

So the elapsed time went down from 2.795 to 0.914, which is about three times faster. Not bad.

The package also contains parallel and collect functions which allow you to run any processes in parallel, then collect will recover the results when they are all finished.

I have only just started using it, but first impressions are good. 

To leave a comment for the author, please follow the link and comment on his blog: compBiomeBlog.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.