Hyperthreading FTW? Testing parallelization performance in R.

[This article was first published on Rcrastinate, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Alright, let’s test some parallelization functionalities in R.

The machine:
MacBook Air (mid-2013) with 8 GB of RAM and the i7 CPU (Intel i7 Haswell 4650U). This CPU is hyper-threaded, meaning (at least that’s my understanding of it) that it has two physical cores but can run up to four threads.

The task:
Draw a number of cases from a normal distribution with a mean of 10 and a standard deviation of 30. Do this a hundred times and combine the result in one vector. The number of cases is varied from half a million to two millions. The number of cores used by R is also varied (between 1 and 4). All this is done 5 times, hence we get multiple estimates of each run’s properties. Altogether, 80 runs are made: 5 times x 4 n-cores x 4 n-cases = 80 runs.

The results:

This is quite interesting: We clearly see that there is virtually no performance gain for the 3- and 4-core runs. I guess this is because we do not really have 4 physical cores available on the hyper-threaded CPU. So, it does not really make a difference if we assign 2 or 3 or 4 cores to a task on a hyper-threaded CPU. The performance gain from 1 to 2 cores, however, is quite clear.
Code (plotting code not supplied):
result.df <- data.frame()

for (i in 1:5) {
  for (cases in c(500000, 1000000, 1500000, 2000000)) {
    cat(cases, “\n”)
    for (cores in c(1,2,3,4)) {
      n.cores <- cores
      n.cases <- cases
      cluster <- makeCluster(n.cores)
      t1 <- Sys.time()
      result.vec <- foreach(i = 1:100, .combine=c) %dopar% {
        rnorm(n.cases, mean = 10, sd = 30)
      difft <- difftime(Sys.time(), t1, units = "secs")
      result.df <- rbind(result.df, c(n.cores, n.cases, difft))

To leave a comment for the author, please follow the link and comment on their blog: Rcrastinate.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)