Alright, let’s test some parallelization functionalities in R.

__The machine:__

MacBook Air (mid-2013) with 8 GB of RAM and the i7 CPU (Intel i7 Haswell 4650U). This CPU is hyper-threaded, meaning (at least that’s my understanding of it) that it has two physical cores but can run up to four threads.

__The task:__

Draw a number of cases from a normal distribution with a mean of 10 and a standard deviation of 30. Do this a hundred times and combine the result in one vector. The number of cases is varied from half a million to two millions. The number of cores used by R is also varied (between 1 and 4). All this is done 5 times, hence we get multiple estimates of each run’s properties. Altogether, 80 runs are made: 5 times x 4 n-cores x 4 n-cases = 80 runs.

__The results:__

This is quite interesting: We clearly see that there is virtually no performance gain for the 3- and 4-core runs. I guess this is because we do not really have 4 physical cores available on the hyper-threaded CPU. So, it does not really make a difference if we assign 2 or 3 or 4 cores to a task on a hyper-threaded CPU. The performance gain from 1 to 2 cores, however, is quite clear.

__Code__ (plotting code not supplied):

library(doParallel)

library(parallel)

result.df <- data.frame()

for (i in 1:5) {

cat(i,”\n”)

for (cases in c(500000, 1000000, 1500000, 2000000)) {

cat(cases, “\n”)

for (cores in c(1,2,3,4)) {

n.cores <- cores

n.cases <- cases

cluster <- makeCluster(n.cores)

registerDoParallel(cluster)

t1 <- Sys.time()

result.vec <- foreach(i = 1:100, .combine=c) %dopar% {

rnorm(n.cases, mean = 10, sd = 30)

}

difft <- difftime(Sys.time(), t1, units = “secs”)

result.df <- rbind(result.df, c(n.cores, n.cases, difft))

}}}

*Related*

To

**leave a comment** for the author, please follow the link and comment on their blog:

** Rcrastinate**.

R-bloggers.com offers

**daily e-mail updates** about

R news and

tutorials on topics such as:

Data science,

Big Data, R jobs, visualization (

ggplot2,

Boxplots,

maps,

animation), programming (

RStudio,

Sweave,

LaTeX,

SQL,

Eclipse,

git,

hadoop,

Web Scraping) statistics (

regression,

PCA,

time series,

trading) and more...

If you got this far, why not

__subscribe for updates__ from the site? Choose your flavor:

e-mail,

twitter,

RSS, or

facebook...