A common theme over the last few decades was that we could afford to simply
sit back and let computer (hardware) engineers take care of increases in
computing speed thanks to
That same line of thought now frequently points out that we
are getting closer and closer to the physical limits of what
Moore’s law can
do for us.
So the new best hope is (and has been) parallel processing. Even our smartphones have
multiple cores, and most if not all retail PCs now possess two, four or more
cores. Real computers, aka somewhat decent servers, can be had with 24, 32
or more cores as well, and all that is before we even consider GPU
coprocessors or other upcoming changes.
And sometimes our tasks are embarassingly simple as is the case with many
data-parallel jobs: we can use higher-level operations such as those offered
by the base R package parallel
to spawn multiple processing tasks and gather the results. I covered all this
in some detail in previous talks
on High Performance Computing with R (and you can also consult the
Task View on High Performance Computing with R
which I edit).
But sometimes we can’t use data-parallel approaches. Hence we have to redo our algorithms. Which is
really hard. R itself has been relying on the (fairly mature) OpenMP
standard for some of its operations. Luke Tierney’s
(awesome) keynote in May at our
(sixth) R/Finance conference mentioned some of the issues related to
OpenMP. One which matters
is that OpenMP works really well on Linux, and
either not so well (Windows) or not at all (OS X, due the usual issue with
the gcc/clang switch enforced by Applem but the good news is that the OpenMP
toolchain is expected to make it to OS X is some more performant form
“soon”). R is still expected to make wider use of OpenMP in future versions.
Another tool which has been around for a few years, and which can be considered
to be equally mature is the
Intel Threaded Building Blocks
library, or TBB. JJ recently started to wrap this up for use by R. The first
approach resulted in a (now superseded, see below) package TBB.
But hardware and OS issues bite once again, as the Intel TBB is not really
building that well for the Windows toolchain used by R (and based on MinGW).
(And yes, there are two more options. But Boost Threads requires linking
which precludes easy use as e.g. via our
BH package. And C++11 with its
threads library (based on Boost Threads) is not yet as widely available as R
and Rcpp which means that it is not a real deployment option yet.)
Now, JJ, being as awesome as he is, went back to the drawing board and integrated a
second threading toolkit: TinyThread++,
a small header-only library without further dependencies. Not as
feature-rich as Intel Threaded Building Blocks,
but at least available everywhere. So a new package
RcppParallel, so far
only on GitHub, wraps around both TinyThread++
and Intel Threaded Building Blocks and
offers a consistent interface available on all platforms used by R.
Better still, JJ also authored several pieces demonstrating this new package for the
- A parallel matrix transformation
- A parallel vector summation
- A parallel inner product
- Parallel Distance Matrix Calculation with RcppParallel
All four are interesting and demonstrate different aspects of parallel
computing via RcppParallel.
But the last article is key. Based on a question by Jim Bullard, and then
written with Jim, it shows how a particular matrix distance metric (which is
missing from R) can be implemented in a serial manner in both R, and
also via Rcpp. The key implementation, however, uses both Rcpp and
thereby achieves a truly impressive speed gain as the gains from using
compiled code (via Rcpp) and from using a parallel algorithm (via
RcppParallel) are multiplicative! Between JJ’s and my four-core machines the
gain was between 200 and 300 fold—which is rather considerable. For
kicks, I also used a much bigger machine at work which came in at an even
larger speed gain (but gains become clearly sublinear as the number of cores
increases; there are however some tuning parameters).
So these are exciting times. I am sure there will be lots more to come. For
now, head over to the RcppParallel
package and start playing. Further contributions to the
Rcpp Gallery are not only welcome but