Faster (recursive) function calls: Another quick Rcpp case study

September 8, 2011
By

(This article was first published on Thinking inside the box , and kindly contributed to R-bloggers)

There was another question recently on StackOverflow that I had meant to
discuss in a follow-up post here. User deltanovember asked
about slow recursive functions
and used the very classic
Fibonacci number
as an example. To recap, Fibonacci number are defined with two initial
values F(0) = 0 and F(1) = 1; thereafter the Fibonacci number F(n) is defined
as the sum of the two preceding numbers: F(n) = F(n-2) + F(n-1).

This leads to very straightforward implementations using recursion:

## R implementation of recursive Fibonacci sequence
fibR <- function(n) {
    if (n == 0) return(0)
    if (n == 1) return(1)
    return (fibR(n - 1) + fibR(n - 2))
}

Unfortunately, this elegant implementation which remain close to the abtract
formulation of the recurrence algorithm performs very poorly in R as
there is noticeable overhead in function calls which becomes dominant in a
recursion. This lead to the original question on StackOverflow, and the
accepted answer uses a trick presented by Pat Burns in his lovely
R Inferno:
rewrite the solution using a computer science trick called memoization:

fibonacci <- local({
    memo <- c(1, 1, rep(NA, 100))
    f <- function(x) {
        if(x == 0) return(0)
        if(x < 0) return(NA)
        if(x > length(memo))
        stop("'x' too big for implementation")
        if(!is.na(memo[x])) return(memo[x])
        ans <- f(x-2) + f(x-1)
        memo[x] <<- ans
        ans
    }
})

That is a fair answer, and even more was suggested with a link to a terrific
analysis calling the Fibonacci recurrence
the worst algorithm in the world.
That is also fair, but all the basic research into better algorithms
exploiting some structure of the problem to advance performance (and of
course understanding) is overlooking one crucial part: algorithm analysis
is essentially independent of the language
. So whatever improvements we
obtain by thinking really hard about a problem are then
available for other implementations too.

So with a tip of the hat to the old Larry Wall quote about
Lazyness, Impatience and Hubris, I would like to suggest what I consider a
much simpler route to much better performance: recode it in C++ using
both Rcpp (for the R/C++ integration) and
inline for the on-the-fly compilation, linking and loading of C++ code into R.

## inline to compile, load and link the C++ code
require(inline)

## we need a pure C/C++ function as the generated function
## will have a random identifier at the C++ level preventing
## us from direct recursive calls
incltxt <- '
  int fibonacci(const int x) {
     if (x == 0) return(0);
     if (x == 1) return(1);
     return (fibonacci(x - 1)) + fibonacci(x - 2);
  }'

## now use the snippet above as well as one argument conversion
## in as well as out to provide Fibonacci numbers via C++
fibRcpp <- cxxfunction(signature(xs="int"),
                       plugin="Rcpp",
                       incl=incltxt,
                       body
                       ='
   int x = Rcpp::as<int>(xs);
   return Rcpp::wrap( fibonacci(x) );
')

This single R function call cxxfunction() takes the code
embedded in the arguments to the body variable (for the core
function) and the incltxt variable for the helper function we
need to call. This helper function is needed for the recursion as
cxxfunction() will use an randomized internal identifier for the
function called from R preventing us from calling this (unknown) indentifier.
But the rest of the algorithm is simple, and as beautiful as the initial
recurrence. Three lines, three statements, and three cases for F(0), F(1) and
the general case F(n) solved by recursive calls. This also illustrates how
easy it is to get an integer from R to C++ and back: the
as and wrap simply do the right thing
converting to and from the SEXP types used internally by the C
API of R.

A performance comparison of the basic R version fibR, its
byte-compiled variant fibRC and
and the C++ version fibRcpp shown above is very compelling.
We have added a file fibonacci.r to the large and still growing
set of examples included with Rcpp, and we can just
execute that script with Rscript or (as here) r
from the littler package:

edd@max:~/svn/rcpp/pkg/Rcpp/inst/examples/Misc$ r fibonacci.r
Loading required package: inline
Loading required package: methods
Loading required package: compiler
        test replications elapsed relative user.self sys.self
3 fibRcpp(N)            1   0.092   1.0000      0.09     0.00
2   fibRC(N)            1  61.480 668.2609     61.47     0.00
1    fibR(N)            1  61.877 672.5761     61.83     0.02
edd@max:~/svn/rcpp/pkg/Rcpp/inst/examples/Misc$ 

So the recursion for the original argument of N=35 takes just over a
minute at about 61.5 and 61.9 seconds, respectively, for the R version and
its byte-compiled variant (as per the column titled elapsed). So
byte-compilation essentially offers no help for the bottleneck of slow function calls.

The C++ versions relying on Rcpp which created in a few lines of code and a
single call to cxxfunction however takes just 92
milliseconds—or a relative gain of well over six-hundred times.

That provides another nice demonstration of what Rcpp can do. Improved
algorithms for well-understood problems are surely one way to accelerate solutions.
But there are (many ?) times when we do not have the luxury of being able to
think through to a new and improved approach. Or worse, such an approach may
even introduce new errors or inaccurracies if we get it wrong on a first try.
With Rcpp, we are able to the express the problem as written in its original statement: a simple
recursion. The gain relative to a slow R implementation is noteworthy—and
could of course be improved further if we really needed to by relying on
better algorithms like memoization. But for day to day tasks, I gladly take
speedups of (up to) a few hundred times thanks to
Rcpp without having to do hard
algorithmic work.

Before closing, a quick reminder that I will be giving two classes on
Rcpp in a few weeks. These will be in New York on September 24, and San Franciso on October 8, see
this blog post as well as
this page at Revolution Analytics (who are a
co-organiser of the classes) for details and registration information.

To leave a comment for the author, please follow the link and comment on his blog: Thinking inside the box .

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.