I have been meaning to look at adding some parallel processing to R as I have some scripts that are painfully slow and embarrassingly parallel. There seem to be a lot of packages around for doing parallel computing, listed here.

I decided to look at multicore as it seemed easy to implement. The core of the package is the mclapply function, which is the multi core version of lapply. Basically you install the package,

install.packages(“multicore”)

load the library,

library(multicore)

then replace any instances of lapply in your code with mclapply it will speed up your code! Easy.

Obviously there are more complications than this and there are various options you can use, such as the number of cores to use etc.

To give a quick test:

test <- lapply(1:10,function(x) rnorm(10000))

system.time(x <- lapply(test,function(x) loess.smooth(x,x)))

# user system elapsed

# 0.954 0.246 2.795

system.time(x <- mclapply(test,function(x) loess.smooth(x,x)))

# user system elapsed

# 0.896 0.898 0.914

So the elapsed time went down from 2.795 to 0.914, which is about three times faster. Not bad.

The package also contains parallel and collect functions which allow you to run any processes in parallel, then collect will recover the results when they are all finished.

I have only just started using it, but first impressions are good.

*Related*

To

**leave a comment** for the author, please follow the link and comment on their blog:

** compBiomeBlog**.

R-bloggers.com offers

**daily e-mail updates** about

R news and

tutorials on topics such as:

Data science,

Big Data, R jobs, visualization (

ggplot2,

Boxplots,

maps,

animation), programming (

RStudio,

Sweave,

LaTeX,

SQL,

Eclipse,

git,

hadoop,

Web Scraping) statistics (

regression,

PCA,

time series,

trading) and more...

If you got this far, why not

__subscribe for updates__ from the site? Choose your flavor:

e-mail,

twitter,

RSS, or

facebook...