A comparison of high-performance computing techniques in R

June 1, 2015

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

When it comes to speeding up "embarassingly parallel" computations (like for loops with many iterations), the R language offers a number of options:

  • An R looping operator, like mapply (which runs in a single thread)
  • A parallelized version of a looping operator, like mcmapply (which can use multiple cores)
  • Explicit parallelization, via the parallel package or the ParallelR suite (which can use multiple cores, or distribute the problem across nodes in a cluster)
  • Translating the loop to C++ using Rcpp (which runs as compiled and optimized machine code)

Data scientist Tony Fischetti tried all of these methods and more attempting to find the distance between every pair of airports (a problem that grows polynomially in time as the number of airports increases, but which is embarassingly parallel). Here's a chart comparing the time taken via various methods as the number of airports grows:


The clear winner is Rcpp — the orange line at the bottom of the chart. The line looks like it's flat, but while it the time does increase as the problem gets larger, it's much much faster than all the other methods tested. Ironically, Rcpp doesn't use any parallelization at all and so doesn't benefit from the quad-processor system used for testing, but again: it's just that much faster.

Check out the blog post linked before for a detailed comparison of the methods used, and some good advice for using Rcpp effectively (pro-tip: code the whole loop, not just the body, with Rcpp).

On the lambda: Lessons learned in high-performance R

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)