sapply with variadic trailing arguments (…)

[This article was first published on R Enthusiast and R/C++ hero, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Motivation

In R, we can pass further arguments to sapply. The arguments are then passed the function to be applied over.

x <- seq(-3, 3, by=.2 )  
sapply( x, dnorm, 0, 4, FALSE )  

Conceptually this does something like:

sapply( x, function(.){  
    dnorm(.,0,4,FALSE)
} )

Implementation in Rcpp11

sapply has been part of sugar for a long time, and is now very central to the modernized version of sugar in the devel version of Rcpp11, but until now we did not have a mechanism similar to R's ellipsis.

variadic templates and std::tuple give us the tools to implement the feature in Rcpp11.

#include <Rcpp11>
using namespace Rcpp11 ;

// [[Rcpp::export]]
NumericVector bazinga(NumericVector x){  
    NumericVector res = sapply( x, ::Rf_dnorm4, 0.0, 4.0, false ) ;
    return res ;
}

/*** R
    x <- seq(-3, 3, by=.2 )
    bazinga(x)
*/

Details

For the details, further arguments are tied together into a functor object SapplyFunctionBinder wrapping both the underlying function to be called ::Rf_dnorm4 and the data as std::tuple<double,double,bool>.

template <int RTYPE, typename Function, typename... Args >  
class SapplyFunctionBinder {  
public:  
    typedef typename Rcpp::traits::storage_type<RTYPE>::type storage_type ;
    typedef typename std::tuple<Args...> Tuple ;
    typedef typename Rcpp::traits::index_sequence<Args...>::type Sequence ;
    typedef typename std::result_of<Function(storage_type,Args...)>::type fun_result_type ;

    SapplyFunctionBinder( Function fun_, Args&&... args) : 
        fun(fun_), tuple(std::forward<Args>(args)...){}

    inline fun_result_type operator()( storage_type x ) const {
        return apply( x, Sequence() ) ;        
    }

private:  
    Function fun ; 
    Tuple tuple ;

    template <int... S>
    inline fun_result_type apply( storage_type x, Rcpp::traits::sequence<S...> ) const {
        return fun( x, std::get<S>(tuple)... );  
    }

} ;

Alternatives

Alternatively, this can be done with lambda functions :

NumericVector res = sapply( x, [](double a){  
    return ::Rf_dnorm4(a, 0.0, 4.0, false ) ;
} ) ;

We could also bind the function with std::bind :

using namespace std::placeholders ;  
NumericVector res = sapply( x, std::bind(::Rf_dnorm4, _1, 0.0, 4.0, false) ) ;  

Comparison. Cost of the abstraction

Let's compare these alternatives through microbenchmark.

#include <Rcpp11>
using namespace Rcpp11 ;

// [[Rcpp::export]]
NumericVector sapply_variadic(NumericVector x){  
    NumericVector res = sapply( x, ::Rf_dnorm4, 0.0, 4.0, false ) ;
    return res ;
}

// [[Rcpp::export]]
NumericVector sapply_lambda(NumericVector x){  
    NumericVector res = sapply( x, [](double a){
            return ::Rf_dnorm4(a, 0.0, 4.0, false ) ;
    } ) ;
    return res ;
}

// [[Rcpp::export]]
NumericVector sapply_bind(NumericVector x){  
    using namespace std::placeholders ;
    NumericVector res = sapply( x, std::bind(::Rf_dnorm4, _1, 0.0, 4.0, false) ) ;
    return res ;
}

// [[Rcpp::export]]
NumericVector sapply_loop(NumericVector x){  
    int n = x.size() ;
    NumericVector res(n); 
    for( int i=0; i<n; i++){
        res[i] = Rf_dnorm4( x[i], 0.0, 4.0, false ) ;    
    }
    return res ;
}

Here are the results.

$ Rcpp11Script /tmp/test.cpp

> x <- seq(-3, 3, length.out = 1e+06)

> require(microbenchmark)
Loading required package: microbenchmark

> microbenchmark(sapply_variadic(x), sapply_lambda(x),
+     sapply_bind(x), sapply_loop(x))
Unit: milliseconds  
               expr      min       lq   median       uq      max neval
 sapply_variadic(x) 20.01696 20.11962 21.36856 22.07377 31.22063   100
   sapply_lambda(x) 20.53550 20.63079 21.83883 22.55680 31.62075   100
     sapply_bind(x) 19.96870 20.56051 21.32460 22.26086 30.66203   100
     sapply_loop(x) 20.81417 20.92458 22.13156 22.84175 31.48991   100
All 4 solutions give pretty identical performance. This is abstraction we did not have to pay for. In comparisons, a direct call to the vectorised R function dnorm

R_direct <- function(x){  
    dnorm(x, 0, 4, FALSE)
}

yields:

> microbenchmark(sapply_variadic(x), sapply_lambda(x),
+     sapply_bind(x), sapply_loop(x), R_direct(x))
Unit: milliseconds  
               expr      min       lq   median       uq      max neval
 sapply_variadic(x) 20.05441 20.12312 21.35391 22.67544 31.34657   100
   sapply_lambda(x) 20.28648 20.39238 21.60423 22.31166 30.66797   100
     sapply_bind(x) 19.94212 20.02965 21.26132 21.92637 30.68198   100
     sapply_loop(x) 20.25010 20.31937 21.57333 22.89537 31.75865   100
        R_direct(x) 33.73723 33.89319 35.06729 36.05020 43.95266   100

I also intended to test the version using R's sapply.

sapply_R <- function(x){  
    sapply(x, dnorm, 0, 4, FALSE )
}

But ... life's too short and I killed it.

To leave a comment for the author, please follow the link and comment on their blog: R Enthusiast and R/C++ hero.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)