Parallel processing with short jobs only increases the run time

[This article was first published on NumberTheory » R stuff, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Parallel processing has become much more important over the years as multi-core processors have become common place. From version 02.14 onwards, parallel processing has become part of the standard R installation in the form of the parallel package. This package makes parallel makes running parallel jobs as easy as creating a function that runs a job, and calling parSapply on a list of inputs to this function.

Of course, parallelisation incurs some overhead: information needs to be distributed over the nodes, and the result from each node needs to be collected and aggregated into the resulting object. This overhead is one of the main reasons why in certain cases parallel processing takes longer than sequential processing, see for example this StackOverflow question.

In this post I explore the influence of the time a single job takes on the total performance of parallel processing compared to sequential processing. To simulate a job, I simply use the R function Sys.sleep. The problem that I solve is simply waiting for a second. By cutting this second up into increasingly small pieces, the size of each job becomes shorter and shorter. By comparing the run-time of calling Sys.sleep sequentially and in parallel, I can investigate the relation between the temporal size of a job and the performance of parallel processing.

The following figure shows the results of my experiment (the R code is listed at the end of the blogpost):

The x-axis shows the run-time of an individual job in msecs, the y-axis shows the factor between parallel and sequential processing (> 1 means parallel is faster), and the color shows the result for 4 and 8 cores. The dots are runs comparing parallel and sequential processing (20 repetitions), the lines shows the median value for the 20 repetitions.

The most striking feature is that shorter jobs decrease the effectiveness of parallel processing, from around 0.011 msecs parallel processing becomes slower than sequential processing. From that moment, the overhead of parallelisation is bigger than the gain. In addition, above 0.011 msecs, parallel processing might be faster, but it is a far cry from the 4-8 fold increase in performance one would naively expect. Finally, for the job sizes in the figure, increasing the number of cores only marginally improves performance.

In conclusion, when individual jobs are short, parallelisation is going to have a small impact on performance, or even decrease performance. Keep this in the back of your mind when trying to run your code in parallel.

Source code needed to perform the experiment:

library(parallel)
library(ggplot2)
theme_set(theme_bw())
library(plyr)

wait = function(dummy, secs) {
    Sys.sleep(secs)
    return(runif(1000))
}

compare_sequential_parallel = function(n, cores, replicate_id) {
    s = 1/n

    cl = makeCluster(cores)
    t_parallel = system.time(bla <- parSapply(cl, 1:n, wait, secs = s))
    stopCluster(cl)
    t_sequential = system.time(bla <- sapply(1:n, wait, secs = s))

    return(c(seq = t_sequential['elapsed'], 
                      par = t_parallel['elapsed'],
                      n = n, cores = cores, replicate_id = replicate_id))
}

n_list = seq(50e3, 10e4, length = 9)
cores_list = c(4, 8)
no_replicates = 20
res = mdply(expand.grid(n = n_list, 
                        cores = cores_list, 
                        replicate_id = 1:no_replicates), 
            compare_sequential_parallel, .progress = 'text')
res_mean = ddply(res, .(cores, n), summarize, 
                 seq.elapsed = median(seq.elapsed), 
                 par.elapsed = median(par.elapsed))
save(res, res_mean, file = 'result.rda')

gg = ggplot(res, aes(x = 1000/n, 
                   y = seq.elapsed / par.elapsed, 
                   color = as.factor(cores))) + 
    geom_point(alpha = 0.4) + geom_line(data = res_mean) +
    geom_hline(aes(yintercept = 1)) + 
    scale_x_continuous('Run time jobs [msecs]') + 
    scale_y_continuous('Factor that parallel processing is faster') + 
    scale_color_discrete('Number of\ncores')
print(gg)

To leave a comment for the author, please follow the link and comment on their blog: NumberTheory » R stuff.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)