When it comes to speeding up "embarassingly parallel" computations (like for loops with many iterations), the R language offers a number of options:
- An R looping operator, like mapply (which runs in a single thread)
- A parallelized version of a looping operator, like mcmapply (which can use multiple cores)
- Explicit parallelization, via the parallel package or the ParallelR suite (which can use multiple cores, or distribute the problem across nodes in a cluster)
- Translating the loop to C++ using Rcpp (which runs as compiled and optimized machine code)
Data scientist Tony Fischetti tried all of these methods and more attempting to find the distance between every pair of airports (a problem that grows polynomially in time as the number of airports increases, but which is embarassingly parallel). Here's a chart comparing the time taken via various methods as the number of airports grows:
The clear winner is Rcpp — the orange line at the bottom of the chart. The line looks like it's flat, but while it the time does increase as the problem gets larger, it's much much faster than all the other methods tested. Ironically, Rcpp doesn't use any parallelization at all and so doesn't benefit from the quad-processor system used for testing, but again: it's just that much faster.
Check out the blog post linked before for a detailed comparison of the methods used, and some good advice for using Rcpp effectively (pro-tip: code the whole loop, not just the body, with Rcpp).
On the lambda: Lessons learned in high-performance R