# Using RcppParallel to aggregate to a vector

This article demonstrates using the

RcppParallel package to aggregate

to an output vector. It extends directly from previous demonstrations of

single-valued

aggregation, through

providing necessary details to enable aggregation to a vector, or by

extension, to any arbitrary form.

### The General Problem

Many tasks require aggregation to a vector result, and many such tasks can be

made more efficient by performing such aggregation in parallel. The general

problem is that the vector in which results are to be aggregated has to be

shared among the parallel threads. This is a `parallelReduce`

task – we need

to split the singular task into effectively independent, parallel tasks,

perform our aggregation operation on each of those tasks, yielding as many

instances of our aggregate result vector as there are parallel tasks, and

then finally join all of those resultant vectors from the parallel tasks into

our desired singular result vector. The general structure of the code

demonstrated here extends from the previous Gallery article on parallel

vector sums, through

extending to summation to a vector result, along with the passing of

additional variables to the parallel worker. The following code demonstrates

aggregation to a vector result that holds the row sums of a matrix, noting at

the output that is not intended to represent efficient code, rather it is

written to explicitly emphasise the principles of using `RcppParallel`

to

aggregate over a vector result.

### The parallelReduce Worker

The following code defines our parallel worker, in which the input is

presumed for demonstration purposes to be a matrix stored as a single vector,

and so has of total length `nrow * ncol`

. The demonstration includes a few

notable features:

The main

`input`

simply provides an integer index into the rows of the

matrix, with the parallel job splitting the task among elements of that

index. This explicit specification of an index vector is not necessary, but

serves here to clarify what the worker is actually doing. An alternative

would be for`input`

to be`the_matrix`

, and subsequently call the parallel

worker only over`[0 ... nrow]`

of that vector which has a total length of

`nrow * ncol`

.We are passing two additional variables specifying

`nrow`

and`ncol`

.

Although one of these could be inferred at run time, we pass them simply to

demonstrate how this is done. Note in particular the form in the second

constructor, called for each`Split`

job, which accepts as input the

variables as defined by the main constructor, and so all variable definitions

are of the form,`nrow(oneJob.nrow)`

. The initial constructor also has input

variables explicitly defined with`_in`

suffices, to clarify exactly how such

variable passing works.No initial values for the

`output`

are passed to the constructors. Rather,

`output`

must be resized to the desired size by each of those constructors,

and so each repeats the line`output.resize(nrow, 0.0)`

, which also

initialises the values. (This is more readily done using a`std::vector`

than an`Rcpp`

vector, with final conversion to an`Rcpp`

vector result

achieved through a simple`Rcpp::wrap`

call.)

The worker can then be called via `parallelReduce`

with the following code,

in which `static_cast`

s are necessary because `.size()`

applied to `Rcpp`

objects returns an `R_xlen_t`

or `long`

value, but we need to pass `unsigned`

or

long`size_t`

values to the worker to use as indices into standard C++

vectors. The `output`

of `oneJob`

is a `std::vector`

, which is

converted to an `Rcpp::NumericVector`

through a simple call to `Rcpp::wrap`

.

### Demonstration

Finally, the following code demonstrates that this parallel worker correctly

returns the row sums of the input matrix.

[1] TRUE

You can learn more about using RcppParallel at

https://rcppcore.github.com/RcppParallel.

