Are parallel simulations in the cloud worth it? Benchmarking my MBP vs my Workstation vs Amazon EC2

[This article was first published on R Psychologist - R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

If you tend to do lots of large Monte Carlo simulations, you’ve probably already discovered the benefits of multi-core CPUs and parallel computation. A simulation that takes 4 weeks without parallelization, can easily be done in 1 week on a quad core laptop with parallelization. However, for even larger simulations reducing the computation time down from e.g. 8 months down to 2 months is where you either, start to think about reducing your simulation study, or getting more CPU power. I prefer the latter option (yeah, I know it’s not always an option…), and often perform larger runs on Amazon EC2 instances, and more daily largeish runs on my workstation. Recently, I saw this post by Max Kuhn: While you wait for that to finish, can I interest you in parallel processing? where he compared computers with 4 to 10 cores. Since I already had a bunch of benchmarks done on Amazon’s EC2 c5.* instances (2 to 72 cores), for an article I’m writing, I figured it’d be interesting to compare those results to more everyday PhD student-level computers. So in this post I compare two different Macbook Pros, a dual CPU workstation, and different EC2 instances.


The joy of running 72 R sessions in parallel.

The Machines

These are the machines I benchmarked. Logical cores indicate the maximum number of virtual cores, or threads. I didn’t remember to write down how many sockets the Amazon machines had.

MachineCPUGHzCoresLogical Cores
Macbook Pro 15″ Mid 2012Intel Core i7-3720QM2.6 GHz48
Macbook Pro 15″ Mid 2015Intel Core i7-4980HQ2.8 GHz48
HP Z620 WorkstationXeon E5-2670 (x 2)2.6 GHz1632
Amazon EC2 c5.largeXeon Platinum 8124M3.0 GHz12
Amazon EC2 c5.xlargeXeon Platinum 8124M3.0 GHz24
Amazon EC2 c5.2xlargeXeon Platinum 8124M3.0 GHz48
Amazon EC2 c5.4xlargeXeon Platinum 8124M3.0 GHz816
Amazon EC2 c5.9xlargeXeon Platinum 8124M3.0 GHz1836
Amazon EC2 c5.18xlargeXeon Platinum 8124M3.0 GHz3672

The Simulation

I used my package powerlmm (link) to simulate longitudinally clustered data, with 6 clusters per treatment arm, 50 subjects per clusters, and 11 measures on each subject. So 600 * 11 observations in total. This three-level model was fit with a random intercept and slope at the subject-level, and a random slope at the cluster level. To make the machines work even harder, I approximated dfs using Satterthwaite’s approximation, using lmerTest. The number of simulations was 5000.

The Macbooks were running macOS, and my HP workstation and Amazon’s EC2 instances were running Ubuntu 16.04. R 3.4.3 was installed on all machines

Results

Unexpectedly, running only 1 core was the slowest, which took between 50 to 80 minutes. The maximum performance of each machine was:

  • MBP (2012) 17 min.
  • MPB (2015) 13 min.
  • HP Z620 Workstation 5 min.
  • Amazon EC2 c5.18xlarge 2 min.

Moreover, running the simulations in a shell tended to be slightly faster then using a GUI (RStudio), and fork and PSOCK clusters were equally fast. However, the time it took to spawn the PSOCK workers is excluded from the benchmark. The figure bellow shows the elapsed time for all machines.

center

Speedup was fairly linear for the number of physical cores per machine. The figure bellow shows relative speedup for each machine.

center

So is cloud computing worth it?

Obviously, having remote access to a 72 core 3 GHz machine is an extremely convenient way to significantly decrease your simulation time. And RStudio Server makes remote R work painless. However, my ~5 year old HP workstation still performs extremely well. So if you have to pay for these things yourself, buying a refurbished workstation is a really good option, especially if you tend to do a lot of simulations. You can probably buy a refurbished Z620 for 600 EUR on Ebay, which is approximately what it costs to run the c5.9xlarge instance for 17 days (on-demand).

Code

Lastly, here’s the code I used.

Simulation functions

# filename: funcs.R

#' @param cores no. cores to use
#' Rest of the arguments explained in run_benchmark()
benchmark_sim <- function(cores, object, type, label, nsim) {
   message("Benchmarking ", cores, " cores")
   cl <- parallel::makeCluster(cores, 
                               type = type)
   res <- simulate(p, 
                   cores = cores,
                   nsim = nsim, 
                   satterthwaite = TRUE, 
                   cl = cl)
   parallel::stopCluster(cl)
   pow <- summary(res)$summary$correct$FE$Power_satt[4]
   data.frame(power = pow, 
              time = res$time,
              cores = cores,
              label = paste(label, type, sep = "_"))

}

#' @param object simulation_parameters()-object
#' @param max_cores benchmark 1, ..., max_cores
#' @param type type of cluster to use
#' @param label a label for the experiment
#' @param nsim number of simulations
#' @param type FORK or PSOCK
run_benchmark <- function(object, max_cores, nsim, label, type = NULL) {
  if(is.null(type)) {
    # only use forking in non-interactive session
    if(interactive()) {
      type <- "PSOCK"
    } else {
      type <- "FORK"
    }
  }

  res <- lapply(1:max_cores, 
                benchmark_sim, 
                object = object,
                type = type,
                label = label,
                nsim = nsim)
  res <- do.call(rbind, res)

  # save
  fname <- paste0(unique(res$label), "_shell.rds")
  saveRDS(res, file = fname)
}

Run benchmark

source("funcs.R")
library(powerlmm)
library(methods)


# setup sim
p <- study_parameters(n1 = 11,
                      n2 = 50,
                      n3 = 6,
                      fixed_intercept = 37,
                      fixed_slope = -0.64,
                      sigma_subject_intercept = 2.8,
                      sigma_subject_slope = 0.4,
                      sigma_cluster_intercept = 0,
                      cor_subject = -0.5,
                      icc_slope = 0.05,
                      sigma_error = 2.6,
                      dropout = dropout_weibull(proportion = 0.3, 
                                                rate = 1/2),
                      cohend = -0.5)

# will save results to wd
run_benchmark(p, 
              max_cores = 8, 
              nsim = 5000,
              label = "AWS_c5.18xlarge",
              type = "FORK")

To leave a comment for the author, please follow the link and comment on their blog: R Psychologist - R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)