sapply with variadic trailing arguments (…)

May 27, 2014
By

(This article was first published on R Enthusiast and R/C++ hero, and kindly contributed to R-bloggers)

Motivation

In R, we can pass further arguments to sapply. The arguments are then passed the function to be applied over.

x <- seq(-3, 3, by=.2 )  
sapply( x, dnorm, 0, 4, FALSE )  

Conceptually this does something like:

sapply( x, function(.){  
    dnorm(.,0,4,FALSE)
} )

Implementation in Rcpp11

sapply has been part of sugar for a long time, and is now very central to the modernized version of sugar in the devel version of Rcpp11, but until now we did not have a mechanism similar to R's ellipsis.

variadic templates and std::tuple give us the tools to implement the feature in Rcpp11.

#include <Rcpp11>
using namespace Rcpp11 ;

// [[Rcpp::export]]
NumericVector bazinga(NumericVector x){  
    NumericVector res = sapply( x, ::Rf_dnorm4, 0.0, 4.0, false ) ;
    return res ;
}

/*** R
    x <- seq(-3, 3, by=.2 )
    bazinga(x)
*/

Details

For the details, further arguments are tied together into a functor object SapplyFunctionBinder wrapping both the underlying function to be called ::Rf_dnorm4 and the data as std::tuple<double,double,bool>.

template <int RTYPE, typename Function, typename... Args >  
class SapplyFunctionBinder {  
public:  
    typedef typename Rcpp::traits::storage_type<RTYPE>::type storage_type ;
    typedef typename std::tuple<Args...> Tuple ;
    typedef typename Rcpp::traits::index_sequence<Args...>::type Sequence ;
    typedef typename std::result_of<Function(storage_type,Args...)>::type fun_result_type ;

    SapplyFunctionBinder( Function fun_, Args&&... args) : 
        fun(fun_), tuple(std::forward<Args>(args)...){}

    inline fun_result_type operator()( storage_type x ) const {
        return apply( x, Sequence() ) ;        
    }

private:  
    Function fun ; 
    Tuple tuple ;

    template <int... S>
    inline fun_result_type apply( storage_type x, Rcpp::traits::sequence<S...> ) const {
        return fun( x, std::get<S>(tuple)... );  
    }

} ;

Alternatives

Alternatively, this can be done with lambda functions :

NumericVector res = sapply( x, [](double a){  
    return ::Rf_dnorm4(a, 0.0, 4.0, false ) ;
} ) ;

We could also bind the function with std::bind :

using namespace std::placeholders ;  
NumericVector res = sapply( x, std::bind(::Rf_dnorm4, _1, 0.0, 4.0, false) ) ;  

Comparison. Cost of the abstraction

Let's compare these alternatives through microbenchmark.

#include <Rcpp11>
using namespace Rcpp11 ;

// [[Rcpp::export]]
NumericVector sapply_variadic(NumericVector x){  
    NumericVector res = sapply( x, ::Rf_dnorm4, 0.0, 4.0, false ) ;
    return res ;
}

// [[Rcpp::export]]
NumericVector sapply_lambda(NumericVector x){  
    NumericVector res = sapply( x, [](double a){
            return ::Rf_dnorm4(a, 0.0, 4.0, false ) ;
    } ) ;
    return res ;
}

// [[Rcpp::export]]
NumericVector sapply_bind(NumericVector x){  
    using namespace std::placeholders ;
    NumericVector res = sapply( x, std::bind(::Rf_dnorm4, _1, 0.0, 4.0, false) ) ;
    return res ;
}

// [[Rcpp::export]]
NumericVector sapply_loop(NumericVector x){  
    int n = x.size() ;
    NumericVector res(n); 
    for( int i=0; i<n; i++){
        res[i] = Rf_dnorm4( x[i], 0.0, 4.0, false ) ;    
    }
    return res ;
}

Here are the results.

$ Rcpp11Script /tmp/test.cpp

> x <- seq(-3, 3, length.out = 1e+06)

> require(microbenchmark)
Loading required package: microbenchmark

> microbenchmark(sapply_variadic(x), sapply_lambda(x),
+     sapply_bind(x), sapply_loop(x))
Unit: milliseconds  
               expr      min       lq   median       uq      max neval
 sapply_variadic(x) 20.01696 20.11962 21.36856 22.07377 31.22063   100
   sapply_lambda(x) 20.53550 20.63079 21.83883 22.55680 31.62075   100
     sapply_bind(x) 19.96870 20.56051 21.32460 22.26086 30.66203   100
     sapply_loop(x) 20.81417 20.92458 22.13156 22.84175 31.48991   100

All 4 solutions give pretty identical performance. This is abstraction we did not have to pay for. In comparisons, a direct call to the vectorised R function dnorm

R_direct <- function(x){  
    dnorm(x, 0, 4, FALSE)
}

yields:

> microbenchmark(sapply_variadic(x), sapply_lambda(x),
+     sapply_bind(x), sapply_loop(x), R_direct(x))
Unit: milliseconds  
               expr      min       lq   median       uq      max neval
 sapply_variadic(x) 20.05441 20.12312 21.35391 22.67544 31.34657   100
   sapply_lambda(x) 20.28648 20.39238 21.60423 22.31166 30.66797   100
     sapply_bind(x) 19.94212 20.02965 21.26132 21.92637 30.68198   100
     sapply_loop(x) 20.25010 20.31937 21.57333 22.89537 31.75865   100
        R_direct(x) 33.73723 33.89319 35.06729 36.05020 43.95266   100

I also intended to test the versio using R's sapply.

sapply_R <- function(x){  
    sapply(x, dnorm, 0, 4, FALSE )
}

But ... life's too short and I killed it.

To leave a comment for the author, please follow the link and comment on his blog: R Enthusiast and R/C++ hero.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.