Parallel processing with short jobs only increases the run time

December 27, 2013
By

(This article was first published on NumberTheory » R stuff, and kindly contributed to R-bloggers)

Parallel processing has become much more important over the years as multi-core processors have become common place. From version 02.14 onwards, parallel processing has become part of the standard R installation in the form of the parallel package. This package makes parallel makes running parallel jobs as easy as creating a function that runs a job, and calling parSapply on a list of inputs to this function.

Of course, parallelisation incurs some overhead: information needs to be distributed over the nodes, and the result from each node needs to be collected and aggregated into the resulting object. This overhead is one of the main reasons why in certain cases parallel processing takes longer than sequential processing, see for example this StackOverflow question.

In this post I explore the influence of the time a single job takes on the total performance of parallel processing compared to sequential processing. To simulate a job, I simply use the R function Sys.sleep. The problem that I solve is simply waiting for a second. By cutting this second up into increasingly small pieces, the size of each job becomes shorter and shorter. By comparing the run-time of calling Sys.sleep sequentially and in parallel, I can investigate the relation between the temporal size of a job and the performance of parallel processing.

The following figure shows the results of my experiment (the R code is listed at the end of the blogpost):

The x-axis shows the run-time of an individual job in msecs, the y-axis shows the factor between parallel and sequential processing (> 1 means parallel is faster), and the color shows the result for 4 and 8 cores. The dots are runs comparing parallel and sequential processing (20 repetitions), the lines shows the median value for the 20 repetitions.

The most striking feature is that shorter jobs decrease the effectiveness of parallel processing, from around 0.011 msecs parallel processing becomes slower than sequential processing. From that moment, the overhead of parallelisation is bigger than the gain. In addition, above 0.011 msecs, parallel processing might be faster, but it is a far cry from the 4-8 fold increase in performance one would naively expect. Finally, for the job sizes in the figure, increasing the number of cores only marginally improves performance.

In conclusion, when individual jobs are short, parallelisation is going to have a small impact on performance, or even decrease performance. Keep this in the back of your mind when trying to run your code in parallel.

Source code needed to perform the experiment:

library(parallel)
library(ggplot2)
theme_set(theme_bw())
library(plyr)

wait = function(dummy, secs) {
    Sys.sleep(secs)
    return(runif(1000))
}

compare_sequential_parallel = function(n, cores, replicate_id) {
    s = 1/n

    cl = makeCluster(cores)
    t_parallel = system.time(bla <- parSapply(cl, 1:n, wait, secs = s))
    stopCluster(cl)
    t_sequential = system.time(bla <- sapply(1:n, wait, secs = s))

    return(c(seq = t_sequential['elapsed'], 
                      par = t_parallel['elapsed'],
                      n = n, cores = cores, replicate_id = replicate_id))
}

n_list = seq(50e3, 10e4, length = 9)
cores_list = c(4, 8)
no_replicates = 20
res = mdply(expand.grid(n = n_list, 
                        cores = cores_list, 
                        replicate_id = 1:no_replicates), 
            compare_sequential_parallel, .progress = 'text')
res_mean = ddply(res, .(cores, n), summarize, 
                 seq.elapsed = median(seq.elapsed), 
                 par.elapsed = median(par.elapsed))
save(res, res_mean, file = 'result.rda')

gg = ggplot(res, aes(x = 1000/n, 
                   y = seq.elapsed / par.elapsed, 
                   color = as.factor(cores))) + 
    geom_point(alpha = 0.4) + geom_line(data = res_mean) +
    geom_hline(aes(yintercept = 1)) + 
    scale_x_continuous('Run time jobs [msecs]') + 
    scale_y_continuous('Factor that parallel processing is faster') + 
    scale_color_discrete('Number of\ncores')
print(gg)

To leave a comment for the author, please follow the link and comment on his blog: NumberTheory » R stuff.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.