# Transforming a Matrix in Parallel using RcppParallel

**Rcpp Gallery**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The RcppParallel package includes

high level functions for doing parallel programming with Rcpp. For example,

the `parallelFor`

function can be used to convert the work of a standard

serial “for” loop into a parallel one. This article describes using

RcppParallel to transform an R matrix in parallel.

### Serial Version

First a serial version of the matrix transformation. We take the square root

of each item of a matrix and return a new matrix with the tranformed values.

We do this by using `std::transform`

to call the `sqrt`

function on each

element of the matrix:

#includeusing namespace Rcpp; #include #include // [[Rcpp::export]] NumericMatrix matrixSqrt(NumericMatrix orig) { // allocate the matrix we will return NumericMatrix mat(orig.nrow(), orig.ncol()); // transform it std::transform(orig.begin(), orig.end(), mat.begin(), ::sqrt); // return the new matrix return mat; }

### Parallel Version

Now we’ll adapt our code to run in parallel using the `parallelFor`

function.

RcppParallel takes care of dividing up work between threads, our job is to

implement a “Worker” function object that is called by the RcppParallel

scheduler.

The `SquareRoot`

function object below includes pointers to the input matrix

as well as the output matrix. Within it’s `operator()`

method it performs a

`std::transform`

with the `sqrt`

function on the array elements specified by

the `begin`

and `end`

arguments:

// [[Rcpp::depends(RcppParallel)]] #includeusing namespace RcppParallel; struct SquareRoot : public Worker { // source matrix const RMatrix input; // destination matrix RMatrix output; // initialize with source and destination SquareRoot(const NumericMatrix input, NumericMatrix output) : input(input), output(output) {} // take the square root of the range of elements requested void operator()(std::size_t begin, std::size_t end) { std::transform(input.begin() + begin, input.begin() + end, output.begin() + begin, ::sqrt); } };

Note that `SquareRoot`

derives from `RcppParallel::Worker`

. This is required

for function objects passed to `parallelFor`

.

Note also that we use the `RMatrix`

type for accessing the matrix.

This is because this code will execute on a background thread where it’s not

safe to call R or Rcpp APIs. The `RMatrix`

class is included in the

RcppParallel package and provides a lightweight, thread-safe wrapper around R

matrixes.

Here’s the parallel version of our matrix transformation function that makes

uses of the `SquareRoot`

function object. The main difference is that rather

than calling `std::transform`

directly, the `parallelFor`

function is called

with the range to operate on (in this case based on the length of the input

matrix) and an instance of `SquareRoot`

:

// [[Rcpp::export]] NumericMatrix parallelMatrixSqrt(NumericMatrix x) { // allocate the output matrix NumericMatrix output(x.nrow(), x.ncol()); // SquareRoot functor (pass input and output matrixes) SquareRoot squareRoot(x, output); // call parallelFor to do the work parallelFor(0, x.length(), squareRoot); // return the output matrix return output; }

### Benchmarks

A comparison of the performance of the two functions shows the parallel

version performing about 2.5 times as fast on a machine with 4 cores:

# allocate a matrix m <- matrix(as.numeric(c(1:1000000)), nrow = 1000, ncol = 1000) # ensure that serial and parallel versions give the same result stopifnot(identical(matrixSqrt(m), parallelMatrixSqrt(m))) # compare performance of serial and parallel library(rbenchmark) res <- benchmark(matrixSqrt(m), parallelMatrixSqrt(m), order="relative") res[,1:4] test replications elapsed relative 2 parallelMatrixSqrt(m) 100 0.294 1.000 1 matrixSqrt(m) 100 0.755 2.568

Note that performance gains will typically be 30-50% less on Windows systems

as a result of less sophisticated thread scheduling (RcppParallel does not

currently use TBB on Windows

whereas it does on the Mac and Linux).

You can learn more about using RcppParallel at

https://github.com/RcppCore/RcppParallel.

**leave a comment**for the author, please follow the link and comment on their blog:

**Rcpp Gallery**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.