# Using RcppNT2 to Compute the Sum

January 31, 2016
By

(This article was first published on Rcpp Gallery, and kindly contributed to R-bloggers)

## Introduction

The Numerical Template Toolbox (NT2)
collection of header-only C++ libraries that make it
possible to explicitly request the use of SIMD instructions
when possible, while falling back to regular scalar
operations when not. NT2 itself is powered
by Boost, alongside two proposed
Boost libraries – `Boost.Dispatch`, which provides a
mechanism for efficient tag-based dispatch for functions,
and `Boost.SIMD`, which provides a framework for the
implementation of algorithms that take advantage of SIMD
instructions.
`RcppNT2` wraps
and exposes these libraries for use with `R`.

RcppNT2 introduction article
to get acquainted with the RcppNT2 package.

## Computing the Sum

First, let’s review how we might use `std::accumulate()`
to sum a vector of numbers. We explicitly pass in the
`std::plus()` functor, just to make it clear that
the `std::accumulate()` algorithm expects a binary
functor when accumulating values.

Now, let’s rewrite this to take advantage of RcppNT2.
There are two main steps required to take advantage of
RcppNT2 at a high level:

1. Write a functor, with a templated call operator,
with the implementation written in a
`Boost.SIMD`-aware’ way;

2. Provide the functor as an argument to the
appropriate SIMD algorithm.

Let’s follow these steps in implementing our SIMD sum.

As you can see, it’s quite simple to take advantage of
`Boost.SIMD`. For very simple operations such as this,
RcppNT2 provides a number of pre-defined functors,
which can be accessed in the `RcppParallel::functor`
namespace. The following is an equivalent way of defining
the above function:

Behind the scenes of `simdReduce()`, `Boost.SIMD` will
apply your templated functor to ‘packs’ of values when
appropriate, and scalar values when not. In other words,
there are effectively two kinds of template
specializations being generated behind the scenes: one
with `T = double`, and one with ```T = boost::simd::pack```. The use of the packed
representation is what allows `Boost.SIMD` to ensure
vectorized instructions are used and generated.
`Boost.SIMD` provides a host of functions and operator
overloads that ensure that optimized instructions are
used when possible over a packed object, while falling
back to ‘default’ operations for scalar values when not.

Now, let’s compare the performance of these two
implementations.

```             expr     min       lq     mean   median       uq      max
vectorSum(v) 870.894 887.1535 978.7471 987.1810 989.1060 1792.215
vectorSumSimd(v) 270.985 283.2315 297.8062 287.6565 298.7115  608.373
```

Perhaps surprisingly, the RcppNT2 solution is much
faster – the gains are similar to what we might have
seen when computing the sum in parallel. However, we’re
still just using a single core; we’re just taking
advantage of vectorized instructions provided by the CPU.
In this particular case, on Intel CPUs, `Boost.SIMD` will
ensure that we are using the `addpd` instruction, which
is documented in the Intel Software Developer’s Manual
[PDF].

Note that, for the naive serial sum, the compiler would
likely generate similarly efficient code when the
`-ffast-math` optimization flag is set. By default, the
compiler is somewhat ‘pessimistic’ about the set of
optimizations it can perform around floating point
arithmetic. This is because it must respect the
IEEE floating point standard,
and this means respecting the fact that, for example,
floating point operations are not assocative:

```[1] 1.110223e-16
```

Surprisingly, the above computation does not evaluate to
zero!

In practice, you’re likely safe to take advantage of the
`-ffast-math` optimizations, or `Boost.SIMD`, in your own
work. However, be sure to test and verify!

This article provides just a taste of how RcppNT2 can be used.
If you’re interested in learning more, please check out the
RcppNT2 website.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...