# The performance gains from switching R’s linear algebra libraries

(This article was first published on On The Lambda, and kindly contributed to R-bloggers)

What is often forgotten in the so-called data analysis "language wars” is that, across most of these languages, many common computations are performed using outsourced dynamically linked math libraries. For example, R; Python's Numpy; Julia; Matlab; and Mathematica all make heavy use of the BLAS linear algebra API. As a result, R can't be properly faulted (or praised) for how slowly (or rapidly) it performs Cholesky decomposition since (a) the R core team wasn't responsible for the algorithm's implementation, and (b) neither are other languages' contributors for theirs.

For code that deals predominately with numerical computing and constructs from linear algebra, language battles become more a petty and vacuous squabble over subjective preferences in syntax rather than substantive discourse that encourages true innovation and improvement. That being said, R is the best, and if you disagree you should feel bad.

Two posts ago, I asserted that, for speed-up purposes, recompilation of R was usually unnecessary and that other lower-hanging fruit should be taken before resorting to recompilations. We've already seen that for certain problems, parallelizing your code is a great and relatively easy-to-implement speed up option. Another great option that's available is the ability to swap out R's linear algebra libraries for faster ones. Since these libraries are linked to at run-time—as opposed to being included statically at compile-time—employing the use of these alternative libraries do not require recompilation.

The topic of swapping BLAS implementations has already been covered well in these blog posts by Nathan VanHoudnos and Zachary Mayer, as well as in this paper by Dirk Eddelbuettel, but I thought I’d throw in my thoughts and results, too.

For my comparisons, I pitted OpenBLAS and Apple’s Accelerate Framework's implementation of BLAS against each other and the BLAS that comes with R by default. I wanted to try others too, but I either had an extraordinarily difficult time compiling them, or was unwilling to shell out money for a propriety library (here's looking at you, Intel). Of particular interest to me was trying out the ATLAS library.

I originally thought that testing ATLAS would be redundant because I was given to understand that Apple Accelerate’s "vecLib" was a hand-tuned version of ATLAS for Apple processors. After looking further into it, I discovered that this is no longer the case. Apple asserts in line 632 of cblas.h that "The Apple BLAS is no longer based on ATLAS". Unfortunately, nothing I tried would get ATLAS to compile. C'est la vie.

The benchmarking script I used to test these implementations can be furnished from http://r.research.att.com/benchmarks/R-benchmark-25.R
It records the time to completion of a large series of various computations including matrix creation, inversion, and multiplication. By default, it performs each test 3 times and takes the trimmed geometric mean of the time to completions.

These are the results of the total time elapsed (for all tests) among the three libraries.

As you can see, both Accelerate and OpenBLAS blew R's default BLAS implementation out of the water, with Accelerate marginally outperforming OpenBLAS. The last line of the output from the UNIX "time" command gives us a clue as to why this might be the case:

R --slave 56.10 user 1.65s system 122% cpu 47.162 total


Judging by above-100-percent CPU usage of both Accelerate and OpenBLAS (and how hot my laptop got), I'd wager that the primary source of the improvement is Accelerate's and OpenBLAS's ability to use multiprocessing. If you have more than one core available, this is something that you might want to look into.