Hyperthreading FTW? Testing parallelization performance in R.

March 7, 2014
By

(This article was first published on Rcrastinate, and kindly contributed to R-bloggers)

Alright, let's test some parallelization functionalities in R.

The machine:
MacBook Air (mid-2013) with 8 GB of RAM and the i7 CPU (Intel i7 Haswell 4650U). This CPU is hyper-threaded, meaning (at least that's my understanding of it) that it has two physical cores but can run up to four threads.

The task:
Draw a number of cases from a normal distribution with a mean of 10 and a standard deviation of 30. Do this a hundred times and combine the result in one vector. The number of cases is varied from half a million to two millions. The number of cores used by R is also varied (between 1 and 4). All this is done 5 times, hence we get multiple estimates of each run's properties. Altogether, 80 runs are made: 5 times x 4 n-cores x 4 n-cases = 80 runs.

The results:

This is quite interesting: We clearly see that there is virtually no performance gain for the 3- and 4-core runs. I guess this is because we do not really have 4 physical cores available on the hyper-threaded CPU. So, it does not really make a difference if we assign 2 or 3 or 4 cores to a task on a hyper-threaded CPU. The performance gain from 1 to 2 cores, however, is quite clear.

Code (plotting code not supplied):
library(doParallel)
library(parallel)
result.df <- data.frame()

for (i in 1:5) {
  cat(i,"\n")
  for (cases in c(500000, 1000000, 1500000, 2000000)) {
    cat(cases, "\n")
    for (cores in c(1,2,3,4)) {
      n.cores <- cores
      n.cases <- cases
      cluster <- makeCluster(n.cores)
      registerDoParallel(cluster)
      t1 <- Sys.time()
      result.vec <- foreach(i = 1:100, .combine=c) %dopar% {
        rnorm(n.cases, mean = 10, sd = 30)
      }
      difft <- difftime(Sys.time(), t1, units = "secs")
      result.df <- rbind(result.df, c(n.cores, n.cases, difft))
    }}}

To leave a comment for the author, please follow the link and comment on his blog: Rcrastinate.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.