Modern CPU processors are built with new, extended instruction sets that optimize for certain operations. A class of these allow for vectorized operations, called Single Instruction / Multiple Data (SIMD) instructions. Although modern compilers will use these instructions when possible, they are often unable to reason about whether or not a particular block of code can be executed using SIMD instructions.
The Numerical Template Toolbox (NT2)
is a collection of header-only C++ libraries that make it
possible to explicitly request the use of SIMD instructions
when possible, while falling back to regular scalar
operations when not. NT2 itself is powered
by Boost, alongside two proposed
Boost libraries –
Boost.Dispatch, which provides a
mechanism for efficient tag-based dispatch for functions,
Boost.SIMD, which provides a framework for the
implementation of algorithms that take advantage of SIMD
wraps and exposes these libraries for use with
The primary abstraction that
Boost.SIMD uses under the
hood is the
boost::simd::pack<> data structure. This item
represents a small, contiguous, pack of integral objects
doubles), and comes with a host of functions that
facilitate the use of SIMD operations on those objects when
possible. Although you don’t need to know the details to use
the high-level functionality provided by
useful for understanding what happens behind the scenes.
Here’s a quick example of how we might compute the sum of elements in a vector, using NT2.
Behind the scenes,
simdReduce() takes care of iteration
over the provided sequence, and ensures that we use optimized SIMD
instructions over packs of numbers when possible, and scalar
instructions when not. By passing a templated functor,
simdReduce() can automatically choose the correct template
specialization depending on whether it’s working with a pack
or not. In other words, two template specializations will be
generated in this case: one with
T = double, and another
T = boost::simd::pack.
Let’s confirm that this produces the correct output, and run a small benchmark.
expr min lq mean median uq max sum(data) 894.451 943.4145 1033.5598 1020.5000 1071.327 1429.533 simd_sum(data) 280.585 293.6315 316.6797 307.8795 314.429 574.050
We get a noticable gain by taking advantage of SIMD
instructions here. However, it’s worth noting that we don’t
NaN with the same granularity as
This article provides just a taste of how RcppNT2 can be used. If you’re interested in learning more, please check out the RcppNT2 website.