Modernizing sugar in Rcpp11

May 26, 2014
By

(This article was first published on R Enthusiast and R/C++ hero, and kindly contributed to R-bloggers)

I'm in the process of modernizing the implementation of sugar in Rcpp11. Previous work already improved performance of sugar by allowing sugar classes themselves to implement how to apply themselves into their target vector. For example the sugar class SeqLen leverages std::iota instead of a manual for loop.

template <typename Target>  
inline void apply( Target& target ) const {  
    std::iota( target.begin(), target.end(), 1 ) ;     
}

sugar is based on the expression templates technique, a very popular pattern in C++ libraries that reduces the use of temporaries. Consider the following R code:

y <- exp(abs(x))  

To calculate y in R, we must first create a temporary vector to hold the result of abs(x) and then create a new vector to hold the result of exp(abs(x)). Then the vector we allocated to hold abs(x) is no longer referenced so becomes candidate for garbage collection, etc ... we just allocated something to throw it away right after.

If we were to implement this manually in C++, we would typically write a for loop.

int n = x.size() ;  
NumericVector y(n) ;  
for( int i=0; i<n; i++){  
    y[i] = exp( abs( x[i] ) ) ;
}

There are other ways to do it as well. For example using std::transform

NumericVector y(n) ;  
std::transform( x.begin(), x.end(), y.begin(), [](double a){  
    return exp(abs(a)) ;
}) ;

or using Rcpp::transform :

NumericVector y = transform( x.begin(), x.end(), []( double a){  
    return exp(abs(a));
}) ;

But sugar definitely gives us the most expressive and closest to R solution:

NumericVector y = exp(abs(x)) ;  

Here exp and abs operate on the entire vector, not just the scalar double. As with other expression templates libraries, sugar delays the actual work as much as possible. The expression exp(abs(x)) itself does not create a vector but creates an object that can be assigned to a NumericVector :

auto y = exp(abs(x)) ;  
Rprintf( "type(y) = %s", DEMANGLE(y) ) ;  
// type(y) = Rcpp::sugar::Sapply<14, true, Rcpp::sugar::Sapply<14, true, Rcpp::Vector<14, Rcpp::PreserveStorage>, double (*)(double)>, double (*)(double)>

The expression exp(abs(x)) has created the same object as the expression sapply(sapply(x,::abs),::exp). The first benefit from this is that the code for abs and for exp is exactly the same. After all, we just vectorize a function that operate on scalar values.

The second benefit is that we can identify that we want function composition here. We don't want to first sapply abs over x and then sapply exp over the previous result, what we really want is sapply the composition of the abs and exp scalar functions. That's exactly how it is implemented in Rcpp11. We can even retrieve that function:

auto fun = y.fun ;  
Rprintf( "type(fun) = %s\n", DEMANGLE(decltype(fun)) ) ;  
// type(fun) = Rcpp::functional::Compose<int (*)(int), double (*)(double)>

And the Compose class looks like this:

template <typename F1, typename F2>  
class Compose : public Functoid<Compose<F1,F2>> {  
private:  
    F1 f1 ;
    F2 f2 ;

public:  
    Compose( F1 f1_, F2 f2_ ) : f1(f1_), f2(f2_){}

    template <typename... Args>
    inline auto operator()( Args&&... args ) const -> decltype( f2( f1( std::forward<Args>(args)... ) ) ) {
        return f2( f1( std::forward<Args>(args)... ) );
    }

} ;

And we can just as easily compose lambda functions:

#include <Rcpp.h>
using namespace Rcpp ;

// [[export]]
NumericVector test(NumericVector x){  
  auto square = []( double a ){ return a*a ; } ;
  auto twice  = []( double a ){ return a*2 ; } ; 

  auto y = sapply( sapply(x, square), twice ) ;
  Rprintf( "type(y) = %s\n", DEMANGLE(decltype(y)) ) ;

  auto fun = y.fun ;
  Rprintf( "type(fun) = %s\n", DEMANGLE(decltype(fun)) ) ;

  double val = fun(3.0);
  Rprintf( "val = %5.3f\n", val ) ;

  NumericVector res = y ;
  return res ;
}

/*** R
    test(1:10)
*/

which gives:

$ Rcpp11Script /tmp/exp.cpp

> test(1:10)
type(y) = Rcpp::sugar::Sapply<14, true, Rcpp::sugar::Sapply<14, true, Rcpp::Vector<14, Rcpp::PreserveStorage>, test(Rcpp::Vector<14, Rcpp::PreserveStorage>)::$_0>, test(Rcpp::Vector<14, Rcpp::PreserveStorage>)::$_1>  
type(fun) = Rcpp::functional::Compose<test(Rcpp::Vector<14, Rcpp::PreserveStorage>)::$_0, test(Rcpp::Vector<14, Rcpp::PreserveStorage>)::$_1>  
val = 18.000  
 [1]   2   8  18  32  50  72  98 128 162 200

It becomes even more interesting for how missing values should be treated. Let's now consider that the expression is used on an integer vector.

auto square = []( int a ){ return a*a ; } ;  
auto twice  = []( int a ){ return a*2 ; } ;  
auto y = sapply( sapply(x, square), twice ) ;  

In an iteration based implementation of sugar (like e.g. the implementation in Rcpp), to be correct we would have no choice but to check for NA twice because the two functions operate somewhat independently from one another. So the iteration based implementation in Rcpp would lead to code equivalent to this:

for( int i=0; i<n; i++){  
    int x_i = x[i] ;
    int tmp_i = (x_i == NA_INTEGER) ? NA_INTEGER : square(i) ;
    y[i] = (tmp_i == NA_INTEGER ) ? NA_INTEGER : twice(i) ;
}

With the new composition based approach, we only have to check for NA once, which leads to code much closer to what we would intuitively write manually:

for( int i=0; i<n; i++){  
    int x_i = x[i] ;
    y[i] = x_i == NA_INTEGER ? NA_INTEGER : twice(square(x_i)) ;
}

The composition based approach is still a work in progress, but I believe it will be yet another way to achieve performance improvements for the modernized version of sugar.

We can also generalize the composition approach to several input vectors, via mapply. Consider the expression : x + exp(y) + abs(sin(z)). The challenge is to identify actual vectors we want to iterate over: x, y and z and generate the appropriate function composition. Should be fun.

To leave a comment for the author, please follow the link and comment on his blog: R Enthusiast and R/C++ hero.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.