Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

## Introduction

The Numerical Template Toolbox (NT2)
collection of header-only C++ libraries that make it
possible to explicitly request the use of SIMD instructions
when possible, while falling back to regular scalar
operations when not. NT2 itself is powered
by Boost, alongside two proposed
Boost libraries – `Boost.Dispatch`, which provides a
mechanism for efficient tag-based dispatch for functions,
and `Boost.SIMD`, which provides a framework for the
implementation of algorithms that take advantage of SIMD
instructions.
`RcppNT2` wraps
and exposes these libraries for use with `R`.

RcppNT2 introduction article
to get acquainted with the RcppNT2 package.

## Computing the Variance

As you may or may not know, there are a number of algorithms
for computing the variance,
each making different tradeoffs in algorithmic complexity
versus numerical stability. We’ll focus on implementing a
two-pass algorithm, whereby we compute the mean in a first
pass, and later the sum of squares in a second pass.

First, let’s look at the R code one might write
to compute the variance.

``` 0.833939 0.833939
```

We can decompose the operation into a few distinct steps:

1. Compute the mean for our vector ‘x’,
2. Compute the squared deviations from the mean,
3. Sum the deviations about the mean,
4. Divide the summation by the length minus one.

Naively, we could imagine writing a ‘simdTransform()’ for
step 2, and an ‘simdReduce()’ for step 3. However, this
is a bit inefficient as the transform would require
allocating a whole new vector, with the same length as
our initial vector. When neither ‘simdTransform()’ nor
‘simdReduce()’ seem to be a good fit, we can fall back to
‘simdFor()’. We can pass a stateful functor to handle
accumulation of the transformed results.

Let’s write a class that encapsulates this ‘sum of
squares’ operation.

Now that we have our accumulator class defined, we can
use it to compute the variance. We’ll call our function
‘simdVar()’, and export it using Rcpp attributes in the
usual way.

Let’s confirm that this works as expected…

``` 9.166667 9.166667
```

And let’s benchmark, to see how performance compares.

```           expr    min      lq     mean median      uq    max
var(small) 11.784 14.3395 16.37862 15.096 15.7225 40.346
simdVar(small)  1.506  1.7045  2.06541  1.947  2.1055 10.935
```
```           expr      min       lq      mean    median       uq      max
var(large) 3046.597 3194.231 3278.7417 3301.6205 3323.581 3809.090
simdVar(large)  579.784  594.887  607.0411  607.9845  614.386  712.038
```

As we can see, taking advantage of SIMD instructions
has improved the runtime quite drastically.

However, we should note that this is not an entirely fair
comparison with `R`s implementation of the variance. In
particular, we do not have a nice mechanism for handling
missing values; if your data does have any `NA` or `NaN`
values, they will simply be propagated (and not
necessarily in the same way that `R` propagates
missingness). If you’re interested, a separate example
is provided as part of the RcppNT2 package, and you can
browse the standalone source file
here.

This article provides just a taste of how RcppNT2 can be used.
If you’re interested in learning more, please check out the
RcppNT2 website. 