The new R compiler package in R 2.13.0: Some first experiments

April 12, 2011
By

(This article was first published on Thinking inside the box , and kindly contributed to R-bloggers)

R 2.13.0 will be released tomorrow.
It contains a new package by R Core member
Luke Tierney: compiler. What
a simple and straightforward name, and something that Luke has been working
on for several years. The NEWS file says

    o Package compiler is now provided as a standard package.  See
      ?compiler::compile for information on how to use the compiler.
      This package implements a byte code compiler for R: by default
      the compiler is not used in this release.  See the 'R
      Installation and Administration Manual' for how to compile the
      base and recommended packages.

so the compiler is not yet used for R’s own base and recommended packages.

While working on my slides for the upcoming
Rcpp workshop preceding R/Finance 2011,
I thought of a nice test example to illustrate the compiler. Last summer and fall,
Radford Neal had sent a very nice, long and detailed list of possible patches for R
performance improvements to the development list. Some patches were integrated and some more discussion ensued. One strand was on the
difference in parsing between normal parens and curly braces. In isolation, these seem too large,
but (as I recall) this is due some things ‘deep down’ in the parser.

However, some folks really harped on this topic. And it just doesn’t die as a
post from last week
demonstrated once more.
Last year, Christian Robert had whittled it down
to a set of functions, and I made a somewhat sarcastic post argueing that I’d rather
use Rcpp to get 80-fold speed increase
than spend my time argueing over ten percent changes in code that could be made faster so easily.

So let us now examine what the compiler package can do for us. The starting point is the same as last year: five variants
of computing 1/(1+x) for a scalar x inside an explicit for loop. Real code would never do it this way as vectorisation comes
to the rescue. But for (Christian’s) argument’s sake, it is useful to highlight differences in the parser. We once again use
the nice rbenchmark package to run, time and summarise alternatives:

> ## cf http://dirk.eddelbuettel.com/blog/2010/09/07#straight_curly_or_compiled
> f <- function(n, x=1) for (i in 1:n) x=1/(1+x)
> g <- function(n, x=1) for (i in 1:n) x=(1/(1+x))
> h <- function(n, x=1) for (i in 1:n) x=(1+x)^(-1)
> j <- function(n, x=1) for (i in 1:n) x={1/{1+x}}
> k <- function(n, x=1) for (i in 1:n) x=1/{1+x}
> ## now load some tools
> library(rbenchmark)
> ## now run the benchmark
> N <- 1e6
> benchmark(f(N,1), g(N,1), h(N,1), j(N,1), k(N,1),
+           columns=c("test", "replications",
+           "elapsed", "relative"),
+           order="relative", replications=10)
     test replications elapsed relative
5 k(N, 1)           10   9.764  1.00000
1 f(N, 1)           10   9.998  1.02397
4 j(N, 1)           10  11.019  1.12853
2 g(N, 1)           10  11.822  1.21077
3 h(N, 1)           10  14.560  1.49119

This replicates Christian’s result. We find that function k() is the fastest using curlies, and that explicit exponentiation in
function h() is the slowest with a relative penalty of 49%, or an absolute difference of almost five seconds between the 9.7
for the winner and 14.6 for the worst variant. On the other hand, function f(), the normal way of writing things, does pretty
well.

So what happens when we throw the compiler into the mix? Let’s first create compiled variants using the new cmpfun() function and then try again:

> ## R 2.13.0 brings this toy
> library(compiler)
> lf <- cmpfun(f)
> lg <- cmpfun(g)
> lh <- cmpfun(h)
> lj <- cmpfun(j)
> lk <- cmpfun(k)
> # now run the benchmark
> N <- 1e6
> benchmark(f(N,1), g(N,1), h(N,1), j(N,1), k(N,1),
+           lf(N,1), lg(N,1), lh(N,1), lj(N,1), lk(N,1),
+           columns=c("test", "replications",
+           "elapsed", "relative"),
+           order="relative", replications=10)
       test replications elapsed relative
9  lj(N, 1)           10   2.971  1.00000
10 lk(N, 1)           10   2.980  1.00303
6  lf(N, 1)           10   2.998  1.00909
7  lg(N, 1)           10   3.007  1.01212
8  lh(N, 1)           10   4.024  1.35443
1   f(N, 1)           10   9.479  3.19051
5   k(N, 1)           10   9.526  3.20633
4   j(N, 1)           10  10.775  3.62673
2   g(N, 1)           10  11.299  3.80310
3   h(N, 1)           10  14.049  4.72871

Now things have gotten interesting and substantially faster, for very little cost. Usage is straightforward: take your function and compile
it, and reap more than a threefold speed gain. Not bad at all. Also of note, the difference between the different expressions essentially
vanishes. The explicit exponentiation is still the loser, but there may be an additional explicit function call involved.

So we do see the new compiler as a potentially very useful addition. I am sure more folks will jump on this and run more
tests to find clearer corner cases. To finish, we have to of course once more go back to
Rcpp for some real gains, and some comparison between compiled
and byte code compiled code.

> ## now with Rcpp and C++
> library(inline)
> ## and define our version in C++
> src <- 'int n = as<int>(ns);
+         double x = as<double>(xs);
+         for (int i=0; i<n; i++) x=1/(1+x);
+         return wrap(x); '
> l <- cxxfunction(signature(ns="integer",
+                            xs="numeric"),
+                  body=src, plugin="Rcpp")
> ## now run the benchmark again
> benchmark(f(N,1), g(N,1), h(N,1), j(N,1), k(N,1),
+           l(N,1),
+           lf(N,1), lg(N,1), lh(N,1), lj(N,1), lk(N,1),
+           columns=c("test", "replications",
+           "elapsed", "relative"),
+           order="relative", replications=10)
       test replications elapsed relative
6   l(N, 1)           10   0.120   1.0000
11 lk(N, 1)           10   2.961  24.6750
7  lf(N, 1)           10   3.128  26.0667
8  lg(N, 1)           10   3.140  26.1667
10 lj(N, 1)           10   3.161  26.3417
9  lh(N, 1)           10   4.212  35.1000
5   k(N, 1)           10   9.500  79.1667
1   f(N, 1)           10   9.621  80.1750
4   j(N, 1)           10  10.868  90.5667
2   g(N, 1)           10  11.409  95.0750
3   h(N, 1)           10  14.077 117.3083

Rcpp still shoots the lights out by a factor of 80 (or even almost 120 to the
worst manual implementation) relative to interpreted code. Relative to the compiled byte code, the speed difference is about 25-fold. Now,
these are of course entirely unrealistic code examples that are in no way, shape or form representative of real R work. Effective speed
gains will be smaller for both the (pretty exciting new) compiler package and also for our C++ integration package
Rcpp.

Before I close, two more public service announcements. First, if you use Ubuntu see
this post by Michael on r-sig-debian
announcing his implementation of a suggestion of mine: we now have R alpha/beta/rc builds via his Launchpad PPA. Last Friday, I had the
current R-rc snapshot of R 2.13.0 on my Ubuntu box only about six hours after I (as Debian maintainer for R) uploaded the underlying new
R-rc package build to Debian unstable. This will be nice for testing of upcoming releases. Second, as I mentioned, the
Rcpp workshop
on April 28 preceding
R/Finance 2011 on April 29 and 30 still has a few slots
available, as has the conference itself.

To leave a comment for the author, please follow the link and comment on his blog: Thinking inside the box .

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.