Hyperthreading FTW? Testing parallelization performance in R.

Posted on March 7, 2014 by Sascha W. in R bloggers | 0 Comments

[This article was first published on Rcrastinate, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Alright, let’s test some parallelization functionalities in R.

The machine:
MacBook Air (mid-2013) with 8 GB of RAM and the i7 CPU (Intel i7 Haswell 4650U). This CPU is hyper-threaded, meaning (at least that’s my understanding of it) that it has two physical cores but can run up to four threads.

The task:
Draw a number of cases from a normal distribution with a mean of 10 and a standard deviation of 30. Do this a hundred times and combine the result in one vector. The number of cases is varied from half a million to two millions. The number of cores used by R is also varied (between 1 and 4). All this is done 5 times, hence we get multiple estimates of each run’s properties. Altogether, 80 runs are made: 5 times x 4 n-cores x 4 n-cases = 80 runs.

The results:

This is quite interesting: We clearly see that there is virtually no performance gain for the 3- and 4-core runs. I guess this is because we do not really have 4 physical cores available on the hyper-threaded CPU. So, it does not really make a difference if we assign 2 or 3 or 4 cores to a task on a hyper-threaded CPU. The performance gain from 1 to 2 cores, however, is quite clear.

Code (plotting code not supplied):

library(doParallel)

library(parallel)

result.df <- data.frame()

for (i in 1:5) {

cat(i,”\n”)

for (cases in c(500000, 1000000, 1500000, 2000000)) {

cat(cases, “\n”)

for (cores in c(1,2,3,4)) {

n.cores <- cores

n.cases <- cases

cluster <- makeCluster(n.cores)

registerDoParallel(cluster)

t1 <- Sys.time()

result.vec <- foreach(i = 1:100, .combine=c) %dopar% {

rnorm(n.cases, mean = 10, sd = 30)

}

difft <- difftime(Sys.time(), t1, units = "secs")

result.df <- rbind(result.df, c(n.cores, n.cases, difft))

}}}

To leave a comment for the author, please follow the link and comment on their blog: Rcrastinate.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Hyperthreading FTW? Testing parallelization performance in R.

Related

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)