Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The RcppParallel package includes
high level functions for doing parallel programming with Rcpp. For example,
the parallelReduce function can be used aggreggate values from a set of
inputs in parallel. This article describes using RcppParallel to parallelize
the inner-product
example previously posted to the Rcpp Gallery.
Serial Version
First the serial version of computing the inner product. For this we use
a simple call to the STL std::inner_product function:
#include <Rcpp.h>
using namespace Rcpp;
#include <algorithm>
// [[Rcpp::export]]
double innerProduct(NumericVector x, NumericVector y) {
return std::inner_product(x.begin(), x.end(), y.begin(), 0.0);
}
Parallel Version
Now we adapt our code to run in parallel. We’ll use the parallelReduce
function to do this. This function requires a “worker” function object
(defined below as InnerProduct). For details on worker objects see the
parallel-vector-sum
article on the Rcpp Gallery.
// [[Rcpp::depends(RcppParallel)]]
#include <RcppParallel.h>
using namespace RcppParallel;
struct InnerProduct : public Worker
{
// source vectors
const RVector<double> x;
const RVector<double> y;
// product that I have accumulated
double product;
// constructors
InnerProduct(const NumericVector x, const NumericVector y)
: x(x), y(y), product(0) {}
InnerProduct(const InnerProduct& innerProduct, Split)
: x(innerProduct.x), y(innerProduct.y), product(0) {}
// process just the elements of the range I've been asked to
void operator()(std::size_t begin, std::size_t end) {
product += std::inner_product(x.begin() + begin,
x.begin() + end,
y.begin() + begin,
0.0);
}
// join my value with that of another InnerProduct
void join(const InnerProduct& rhs) {
product += rhs.product;
}
};
Note that InnerProduct derives from the RcppParallel::Worker class. This
is required for function objects passed to parallelReduce.
Note also that we use the RVector<double> type for accessing the vector.
This is because this code will execute on a background thread where it’s not
safe to call R or Rcpp APIs. The RVector class is included in the
RcppParallel package and provides a lightweight, thread-safe wrapper around R
vectors.
Now that we’ve defined the function object, implementing the parallel inner
product function is straightforward. Just initialize an instance of
InnerProduct with the input vectors and call parallelReduce:
// [[Rcpp::export]]
double parallelInnerProduct(NumericVector x, NumericVector y) {
// declare the InnerProduct instance that takes a pointer to the vector data
InnerProduct innerProduct(x, y);
// call paralleReduce to start the work
parallelReduce(0, x.length(), innerProduct);
// return the computed product
return innerProduct.product;
}
Benchmarks
A comparison of the performance of the two functions shows the parallel version performing about 2.5 times as fast on a machine with 4 cores:
x <- runif(1000000)
y <- runif(1000000)
library(rbenchmark)
res <- benchmark(sum(x*y),
innerProduct(x, y),
parallelInnerProduct(x, y),
order="relative")
res[,1:4]
test replications elapsed relative
3 parallelInnerProduct(x, y) 100 0.035 1.000
2 innerProduct(x, y) 100 0.088 2.514
1 sum(x * y) 100 0.283 8.086
Note that performance gains will typically be 30-50% less on Windows systems as a result of less sophisticated thread scheduling (RcppParallel does not currently use TBB on Windows whereas it does on the Mac and Linux).
You can learn more about using RcppParallel at https://github.com/RcppCore/RcppParallel.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
