# The new R compiler package in R 2.13.0: Some first experiments

April 12, 2011
By

(This article was first published on Thinking inside the box , and kindly contributed to R-bloggers)

R 2.13.0 will be released tomorrow. It contains a new package by R Core member Luke Tierney: compiler. What a simple and straightforward name, and something that Luke has been working on for several years. The NEWS file says
    o Package compiler is now provided as a standard package.  See
?compiler::compile for information on how to use the compiler.
This package implements a byte code compiler for R: by default
the compiler is not used in this release.  See the 'R
Installation and Administration Manual' for how to compile the
base and recommended packages.

so the compiler is not yet used for R's own base and recommended packages.

While working on my slides for the upcoming Rcpp workshop preceding R/Finance 2011, I thought of a nice test example to illustrate the compiler. Last summer and fall, Radford Neal had sent a very nice, long and detailed list of possible patches for R performance improvements to the development list. Some patches were integrated and some more discussion ensued. One strand was on the difference in parsing between normal parens and curly braces. In isolation, these seem too large, but (as I recall) this is due some things 'deep down' in the parser.

However, some folks really harped on this topic. And it just doesn't die as a post from last week demonstrated once more. Last year, Christian Robert had whittled it down to a set of functions, and I made a somewhat sarcastic post argueing that I'd rather use Rcpp to get 80-fold speed increase than spend my time argueing over ten percent changes in code that could be made faster so easily.

So let us now examine what the compiler package can do for us. The starting point is the same as last year: five variants of computing 1/(1+x) for a scalar x inside an explicit for loop. Real code would never do it this way as vectorisation comes to the rescue. But for (Christian's) argument's sake, it is useful to highlight differences in the parser. We once again use the nice rbenchmark package to run, time and summarise alternatives:

> ## cf http://dirk.eddelbuettel.com/blog/2010/09/07#straight_curly_or_compiled
> f <- function(n, x=1) for (i in 1:n) x=1/(1+x)
> g <- function(n, x=1) for (i in 1:n) x=(1/(1+x))
> h <- function(n, x=1) for (i in 1:n) x=(1+x)^(-1)
> j <- function(n, x=1) for (i in 1:n) x={1/{1+x}}
> k <- function(n, x=1) for (i in 1:n) x=1/{1+x}
> ## now load some tools
> library(rbenchmark)
> ## now run the benchmark
> N <- 1e6
> benchmark(f(N,1), g(N,1), h(N,1), j(N,1), k(N,1),
+           columns=c("test", "replications",
+           "elapsed", "relative"),
+           order="relative", replications=10)
test replications elapsed relative
5 k(N, 1)           10   9.764  1.00000
1 f(N, 1)           10   9.998  1.02397
4 j(N, 1)           10  11.019  1.12853
2 g(N, 1)           10  11.822  1.21077
3 h(N, 1)           10  14.560  1.49119

This replicates Christian's result. We find that function k() is the fastest using curlies, and that explicit exponentiation in function h() is the slowest with a relative penalty of 49%, or an absolute difference of almost five seconds between the 9.7 for the winner and 14.6 for the worst variant. On the other hand, function f(), the normal way of writing things, does pretty well.

So what happens when we throw the compiler into the mix? Let's first create compiled variants using the new cmpfun() function and then try again:

> ## R 2.13.0 brings this toy
> library(compiler)
> lf <- cmpfun(f)
> lg <- cmpfun(g)
> lh <- cmpfun(h)
> lj <- cmpfun(j)
> lk <- cmpfun(k)
> # now run the benchmark
> N <- 1e6
> benchmark(f(N,1), g(N,1), h(N,1), j(N,1), k(N,1),
+           lf(N,1), lg(N,1), lh(N,1), lj(N,1), lk(N,1),
+           columns=c("test", "replications",
+           "elapsed", "relative"),
+           order="relative", replications=10)
test replications elapsed relative
9  lj(N, 1)           10   2.971  1.00000
10 lk(N, 1)           10   2.980  1.00303
6  lf(N, 1)           10   2.998  1.00909
7  lg(N, 1)           10   3.007  1.01212
8  lh(N, 1)           10   4.024  1.35443
1   f(N, 1)           10   9.479  3.19051
5   k(N, 1)           10   9.526  3.20633
4   j(N, 1)           10  10.775  3.62673
2   g(N, 1)           10  11.299  3.80310
3   h(N, 1)           10  14.049  4.72871

Now things have gotten interesting and substantially faster, for very little cost. Usage is straightforward: take your function and compile it, and reap more than a threefold speed gain. Not bad at all. Also of note, the difference between the different expressions essentially vanishes. The explicit exponentiation is still the loser, but there may be an additional explicit function call involved.

So we do see the new compiler as a potentially very useful addition. I am sure more folks will jump on this and run more tests to find clearer corner cases. To finish, we have to of course once more go back to Rcpp for some real gains, and some comparison between compiled and byte code compiled code.

> ## now with Rcpp and C++
> library(inline)
> ## and define our version in C++
> src <- 'int n = as<int>(ns);
+         double x = as<double>(xs);
+         for (int i=0; i<n; i++) x=1/(1+x);
+         return wrap(x); '
> l <- cxxfunction(signature(ns="integer",
+                            xs="numeric"),
+                  body=src, plugin="Rcpp")
> ## now run the benchmark again
> benchmark(f(N,1), g(N,1), h(N,1), j(N,1), k(N,1),
+           l(N,1),
+           lf(N,1), lg(N,1), lh(N,1), lj(N,1), lk(N,1),
+           columns=c("test", "replications",
+           "elapsed", "relative"),
+           order="relative", replications=10)
test replications elapsed relative
6   l(N, 1)           10   0.120   1.0000
11 lk(N, 1)           10   2.961  24.6750
7  lf(N, 1)           10   3.128  26.0667
8  lg(N, 1)           10   3.140  26.1667
10 lj(N, 1)           10   3.161  26.3417
9  lh(N, 1)           10   4.212  35.1000
5   k(N, 1)           10   9.500  79.1667
1   f(N, 1)           10   9.621  80.1750
4   j(N, 1)           10  10.868  90.5667
2   g(N, 1)           10  11.409  95.0750
3   h(N, 1)           10  14.077 117.3083

Rcpp still shoots the lights out by a factor of 80 (or even almost 120 to the worst manual implementation) relative to interpreted code. Relative to the compiled byte code, the speed difference is about 25-fold. Now, these are of course entirely unrealistic code examples that are in no way, shape or form representative of real R work. Effective speed gains will be smaller for both the (pretty exciting new) compiler package and also for our C++ integration package Rcpp.

Before I close, two more public service announcements. First, if you use Ubuntu see this post by Michael on r-sig-debian announcing his implementation of a suggestion of mine: we now have R alpha/beta/rc builds via his Launchpad PPA. Last Friday, I had the current R-rc snapshot of R 2.13.0 on my Ubuntu box only about six hours after I (as Debian maintainer for R) uploaded the underlying new R-rc package build to Debian unstable. This will be nice for testing of upcoming releases. Second, as I mentioned, the Rcpp workshop on April 28 preceding R/Finance 2011 on April 29 and 30 still has a few slots available, as has the conference itself.