Computing an Inner Product with RcppParallel

July 14, 2014
By

(This article was first published on Rcpp Gallery, and kindly contributed to R-bloggers)

The RcppParallel package includes
high level functions for doing parallel programming with Rcpp. For example,
the parallelReduce function can be used aggreggate values from a set of
inputs in parallel. This article describes using RcppParallel to parallelize
the inner-product
example previously posted to the Rcpp Gallery.

Serial Version

First the serial version of computing the inner product. For this we use
a simple call to the STL std::inner_product function:

#include 
using namespace Rcpp;

#include 

// [[Rcpp::export]]
double innerProduct(NumericVector x, NumericVector y) {
   return std::inner_product(x.begin(), x.end(), y.begin(), 0.0);
}

Parallel Version

Now we adapt our code to run in parallel. We’ll use the parallelReduce
function to do this. This function requires a “worker” function object
(defined below as InnerProduct). For details on worker objects see the
parallel-vector-sum
article on the Rcpp Gallery.

// [[Rcpp::depends(RcppParallel)]]
#include 
using namespace RcppParallel;

struct InnerProduct : public Worker
{   
   // source vectors
   const RVector<double> x;
   const RVector<double> y;
   
   // product that I have accumulated
   double product;
   
   // constructors
   InnerProduct(const NumericVector x, const NumericVector y) 
      : x(x), y(y), product(0) {}
   InnerProduct(const InnerProduct& innerProduct, Split) 
      : x(innerProduct.x), y(innerProduct.y), product(0) {}
   
   // process just the elements of the range I've been asked to
   void operator()(std::size_t begin, std::size_t end) {
      product += std::inner_product(x.begin() + begin, 
                                    x.begin() + end, 
                                    y.begin() + begin, 
                                    0.0);
   }
   
   // join my value with that of another InnerProduct
   void join(const InnerProduct& rhs) { 
     product += rhs.product; 
   }
};

Note that InnerProduct derives from the RcppParallel::Worker class. This
is required for function objects passed to parallelReduce.

Note also that we use the RVector type for accessing the vector.
This is because this code will execute on a background thread where it’s not
safe to call R or Rcpp APIs. The RVector class is included in the
RcppParallel package and provides a lightweight, thread-safe wrapper around R
vectors.

Now that we’ve defined the function object, implementing the parallel inner
product function is straightforward. Just initialize an instance of
InnerProduct with the input vectors and call parallelReduce:

// [[Rcpp::export]]
double parallelInnerProduct(NumericVector x, NumericVector y) {
   
   // declare the InnerProduct instance that takes a pointer to the vector data
   InnerProduct innerProduct(x, y);
   
   // call paralleReduce to start the work
   parallelReduce(0, x.length(), innerProduct);
   
   // return the computed product
   return innerProduct.product;
}

Benchmarks

A comparison of the performance of the two functions shows the parallel
version performing about 2.5 times as fast on a machine with 4 cores:

x <- runif(1000000)
y <- runif(1000000)

library(rbenchmark)
res <- benchmark(sum(x*y),
                 innerProduct(x, y),
                 parallelInnerProduct(x, y),
                 order="relative")
res[,1:4]
                        test replications elapsed relative
3 parallelInnerProduct(x, y)          100   0.035    1.000
2         innerProduct(x, y)          100   0.088    2.514
1                 sum(x * y)          100   0.283    8.086

Note that performance gains will typically be 30-50% less on Windows systems
as a result of less sophisticated thread scheduling (RcppParallel does not
currently use TBB on Windows
whereas it does on the Mac and Linux).

You can learn more about using RcppParallel at
https://github.com/RcppCore/RcppParallel.

To leave a comment for the author, please follow the link and comment on their blog: Rcpp Gallery.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)