R function call overhead in Rcpp(11)

June 24, 2014
By

(This article was first published on R Enthusiast and R/C++ hero, and kindly contributed to R-bloggers)

Some people have asked for some facts about the differences between Rcpp11 and Rcpp, here is just one nugget. If you've been following the Rcpp world for some time, you might have seen comments like:

Calling an R function from C++ is expensive.

The need

The reason why it is expensive is that we have to deal with two concurrent systems for error handling. R uses POSIX longjumps for its error handling while in C++ we tend to use exceptions. Those two things do not play well together. If we were to do it without care and ended longjumping from C++ code, destructors for C++ objects are not called. This goes against C++ determinism and RAII idioms. This is very important. As an example, all Rcpp api object (from both implementations) use the constructor/destructor pair to handle garbage collection, basically when an object is constructed it automatically gets protected, and when its destructor is called the protection is removed. This is the very foundation of Rcpp, so if the destructor is not called because of some longjump, the destructor is not called, and therefore the protection remains. So we might have unnecessarily forever protected objects.

Rcpp implementation

This has ben taken care of soon enough in the development of Rcpp. Every potentially dangerous call to an R function or some R low level api that might longjump is done in some sort of a sandbox. We catch the longjump, throw an exception so that the C++ stack is correctly unwound, calling destructors along the way and eventually we rethrow as an R error when it is fine to do so. All of this happens automatically. This is magic. However Rumplestiltskin would argue that:

In Rcpp this is handled by the Rcpp::Rcpp_eval function:

inline SEXP Rcpp_eval(SEXP expr_, SEXP env) {  
    Shield<SEXP> expr( expr_) ;

    reset_current_error() ;

    Environment RCPP = Environment::Rcpp_namespace();
    SEXP tryCatchSym               = ::Rf_install("tryCatch");
    SEXP evalqSym                  = ::Rf_install("evalq");
    SEXP conditionMessageSym       = ::Rf_install("conditionMessage");
    SEXP errorRecorderSym          = ::Rf_install(".rcpp_error_recorder");
    SEXP errorSym                  = ::Rf_install("error");

    Shield<SEXP> call( Rf_lang3(
        tryCatchSym,
        Rf_lang3( evalqSym, expr, env ),
        errorRecorderSym
    ) ) ;
    SET_TAG( CDDR(call), errorSym ) ;
    /* call the tryCatch call */
    Shield<SEXP> res(::Rf_eval( call, RCPP ) );

    if( error_occured() ) {
        Shield<SEXP> current_error        ( rcpp_get_current_error() ) ;
        Shield<SEXP> conditionMessageCall (::Rf_lang2(conditionMessageSym, current_error)) ;
        Shield<SEXP> condition_message    (::Rf_eval(conditionMessageCall, R_GlobalEnv)) ;
        std::string message(CHAR(::Rf_asChar(condition_message)));
        throw eval_error(message) ;
    }

    return res ;
}

The rationale for this code is that we call R's tryCatch.

Rcpp11 implementation

The Rcpp implementation has served us for years, and it allows us to have safe, deterministic behavior when calling R functions.

Safe is better than fast

But on the other hand:

Safe and fast is even better

In Rcpp11 we are going deeper into the internals of R and completely eliminate the round trip to R. The internals of the implementation are arguably more complex but from the user perspective we have something more generic than Rcpp_eval using lambda functions. Rcpp_eval is implemented on top of try_catch:

inline SEXP Rcpp_eval(SEXP expr, SEXP env ){  
    SEXP res ;
    try_catch( [&](){
        res = Rf_eval(expr, env) ;
    }) ;
    return res ;
}

The rationale here is that whatever is inside the body of the lambda should be C code and when it long jumps, we pick it up and relay it with a C++ exception, ...

The internals is using some of the forbidden api only true R wizards are supposed to manipulate.

Is it worth it ?

Doing what our try_catch is doing still has a cost. Let's compare it to the cost of doing it the Rcpp way:

Consider this code. It calls the appropriately named nothing function in a loop. The function does nothing, we just want to measure the overhead.

#include <Rcpp.h>
using namespace Rcpp ;

// [[Rcpp::export]]
void test(){  
  Function nothing("nothing") ;
  for( int i=0; i<1000000; i++){
    nothing() ;
  }
}

/*** R
  nothing <- function(){}
  system.time(test())
*/

Throwing this file fun.cpp at Rcpp and Rcpp11 with my scripts:

romain@naxos ~ $ RcppScript /tmp/fun.cpp  
> system.time(test())
   user  system elapsed
 10.819   0.034  10.852

romain@naxos ~ $ Rcpp11Script /tmp/fun.cpp  
> system.time(test())
   user  system elapsed
  0.762   0.003   0.765

Gotcha

Unfortunately, on windows I have not figured how to make this approach work yet, so on windows we still use a callback to R strategy. I'm going to DSC this week where I'll present Rcpp11 to an audience mainly made of core R developers and it is one of the points I will address in my talk.

To leave a comment for the author, please follow the link and comment on his blog: R Enthusiast and R/C++ hero.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.