I’ve gotten back to work on speeding up R, starting with improving my suite of speed tests. Among other new features, this suite allows one to easily try out the “byte-code” compiler that is now a standard part of the latest release of R, version 2.13.0. You can get the suite here.
I’ve been running these tests on my new workstation, which has a six-core Intel X5680 processor, running at 3.33GHz. Unfortunately, it’s clear that thing runs somewhat slower when you use all the cores at once, so for consistency one needs to do the speed tests using just one core. (Or one needs some more elaborate, and unclear, protocol for testing the speed of R in a muticore environment.) I haven’t figured out how to get Red Hat Linux to compile 32-bit applications yet, so all the tests are in a 64-bit environment.
I’ve started with comparing the speed of R-2.13.0 with and without functions being compiled, and with comparing R-2.13.0 (without the compiler) to R-2.11.1, which was the last release before some of my speed improvements were incorporated. A plot of the results is here.
Looking first at the effect of the compiler in R-2.13.0, one can see that for programs that do simple operations in loops, the compiler can speed things up by up to a factor of five, though the speed-up is often less than a factor of two, and in one strange case (a very simple for loop) the compiler slows things considerably. As one would expect, there is no speed-up for programs dominated by large operations such as matrix multiplies. There is also little speed-up when operations like matching arguments dominate. There’s a modest speed-up for the vector arithmetic tests, which may be related to storage allocation.
Looking at R-2.13.0 versus R-2.11.0, one can see modest speed-ups for programs doing simple operations, which I believe is due to my improvements to “for” and to construction of argument lists. There are also major improvements to some operations like “transpose”, which are also all due to modifications I introduced, with the exception of the improvement for matrix multiplies, which I believe is due to recent changes to the BLAS, which eliminate some special checks for zero, probably motivated by concern for proper NA/NaN propagation. (My proposed modifications to matrix multiplies can produce a much larger improvement, but were not incorporated.)
Many of my other speed improvements have also not been incorporated into the released version of R. I’m currently updating them for R-2.13.0, and adding some new speed improvements. I hope to release them soon.
I expect that the speed-ups from these improvements will often be comparable to that obtained from using the compiler. Indeed, in some cases they will be the same improvements — the compiler includes some optimizations that can just as easily (or more easily) be done in the interpreter. For instance, the interpreter currently allocates new space for TRUE or FALSE for the result of every comparison or logical operation. I came up with a simple modification to just allocate TRUE, FALSE, and logical NA once, and then re-use them as needed. I then noticed that the compiler does something similar.
Other speed-ups will be different, however. It will be interesting to see the combined effect of using both my speed improvements and the compiler.
UPDATE: I’ve released a new version of these speed tests, which fixes some glitches, adds some new tests, and improves the appearance of the plots. You can get the new version (and new plots comparing 2.13.0 with and without compilation and 2.11.1 versus 2.13.0) here.