(This article was first published on

Alright, let's test some parallelization functionalities in R.**Rcrastinate**, and kindly contributed to R-bloggers)__The machine:__

MacBook Air (mid-2013) with 8 GB of RAM and the i7 CPU (Intel i7 Haswell 4650U). This CPU is hyper-threaded, meaning (at least that's my understanding of it) that it has two physical cores but can run up to four threads.

__The task:__

Draw a number of cases from a normal distribution with a mean of 10 and a standard deviation of 30. Do this a hundred times and combine the result in one vector. The number of cases is varied from half a million to two millions. The number of cores used by R is also varied (between 1 and 4). All this is done 5 times, hence we get multiple estimates of each run's properties. Altogether, 80 runs are made: 5 times x 4 n-cores x 4 n-cases = 80 runs.

__The results:__

This is quite interesting: We clearly see that there is virtually no performance gain for the 3- and 4-core runs. I guess this is because we do not really have 4 physical cores available on the hyper-threaded CPU. So, it does not really make a difference if we assign 2 or 3 or 4 cores to a task on a hyper-threaded CPU. The performance gain from 1 to 2 cores, however, is quite clear.

__Code__(plotting code not supplied):

library(doParallel)

library(parallel)

result.df <- data.frame()

for (i in 1:5) {

cat(i,"\n")

for (cases in c(500000, 1000000, 1500000, 2000000)) {

cat(cases, "\n")

for (cores in c(1,2,3,4)) {

n.cores <- cores

n.cases <- cases

cluster <- makeCluster(n.cores)

registerDoParallel(cluster)

t1 <- Sys.time()

result.vec <- foreach(i = 1:100, .combine=c) %dopar% {

rnorm(n.cases, mean = 10, sd = 30)

}

difft <- difftime(Sys.time(), t1, units = "secs")

result.df <- rbind(result.df, c(n.cores, n.cases, difft))

}}}

To

**leave a comment**for the author, please follow the link and comment on his blog:**Rcrastinate**.R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...