# Faster R through better BLAS

June 15, 2010
By

(This article was first published on CYBAEA Data and Analysis, and kindly contributed to R-bloggers)

Can we make our analysis using the R statistical computing and analysis platform run faster? Usually the answer is yes, and the best way is to improve your algorithm and variable selection.

But recently David Smith was suggesting that a big benefit of their (commercial) version of R was that it was linked to a to a better linear algebra library. So I decided to investigate.

The quick summary is that it only really makes a difference for fairly artificial benchmark tests. For “normal” work you are unlikely to see a difference most of the time.

## The environment

I use R on a 64-bit Fedora 12 Linux system. Fortunately, it is very easy to rebuild R using different libraries on this platform. For the following, I will assume that you have a working rpmbuild environment. The test system has a quad core Intel Xeon E5420 CPU with each core running at 2.50 GHz.

## Benchmarks

Benchmarking R is complex. Very complex. But for this simple test we use two tests from the R Benchmarks page: MASS-ex.R and R-benchmark-25.R. The first is a simple benchmark using the examples from the MASS package, and has the advantage that it reflects real-world problems and real-world analysis, albeit small problems and short analysis. The second is a much more artificial example and primarily test matrix operations.

We run the MASS benchmark as:

/usr/bin/time -p R --vanilla CMD BATCH MASS-ex.R /dev/null


While the R-benchmark-25 is simply:

Rscript --vanilla R-benchmark-25.R


For the MASS benchmark we simply capture the real elapsed time while the R benchmark 2.5 provides more detailed output for the three classes of tests (matrix calculation, -functions, and program execution) as well as overall summaries. They are all shown in the table below.

## Compiler-optimized R

For the experiments that follow the first thing to do is to grab copies of the source RPMs for R and for ATLAS:

cd ~/rpmbuild/SRPMS
cd ..


At the time I did this, I got R-2.11.0-1.fc12.src.rpm and atlas-3.8.3-12.fc12.src.rpm. I crank up the level of optimization that I do when building from source so the first thing is to edit ~/.rpmrc to include the line optflags: x86_64 -O3 -march=native -m64 -g. With that in place we can simply do:

rpmbuild --rebuild SRPMS/R-2.11.0-1.fc12.src.rpm  #  Change version numbers as needed
su -c 'rpm -Uhv --force RPMS/x86_64/R*2.11.0-1*.rpm RPMS/x86_64/libRmath*2.11.0-1*.rpm'


We now have a compiler-optimized version of R and we can re-run our tests. It doesn't make much difference, but that is also good to know.

## ATLAS BLAS libraries

Now let's try linking to the ATLAS BLAS libraries instead. I assume you have them installed (yum install atlas if not) so you can just grab a copy of R-atlas.diff to change the spec file like this:

rpm -ihv SRPMS/R-2.11.0-1.fc12.src.rpm   # Install to your rpmbuild environment
cd SPECS
wget http://static.cybaea.net/files/R-atlas.diff
patch -o R-atlas.spec R.spec R-atlas.diff
cd ..
rpmbuild -bb SPECS/R-atlas.spec
su -c 'rpm -Uhv --force RPMS/x86_64/R*2.11.0-1*.rpm RPMS/x86_64/libRmath*2.11.0-1*.rpm'


You now have a version of R that uses the ATLAS BLAS libraries, so you can re-run the tests. The results are in the table below in the “Optimized R + Standard ATLAS” row.

As expected, the matrix operations from the R-benchmark-25.R runs a lot faster: they complete in about 30-40% of the time, much of which comes from the multi-threading so all four CPU cores are used.

However, for the analysis-heavy code is MASS-ex.R there is little difference. If anything, we see a tiny increase in running time.

Multi-threaded BLAS libraries make no significant difference to real-world analysis problems using R.

## Other BLAS libraries

For good measure we also try an optimized version of ATLAS, but it does not make much difference on the x86_64 architecture:

rpmbuild -D "enable_native_atlas 1" --rebuild SRPMS/atlas-3.8.3-12.fc12.src.rpm
su -c 'rpm -Uhv --force RPMS/x86_64/atlas*3.8.3-12*.rpm'


And (only) for completeness, we also try the standard Netlib BLAS and LAPACK libraries (yum install blas lapack) by the same method as the ATLAS library above but with a slightly different change to the SPEC file: R-blas.diff. It performs a little better than vanilla R.

## Benchmark results

Benchmark results for various optimizations of R and the BLAS library
R version MASS-ex.R R benchmark 2.5
Real Total time Overall mean Ⅰ. Matrix calc. Ⅱ. Matrix functions Ⅲ. Program.
secsindexsecsindexsecsindex secsindexsecsindexsecsindex
Base install 19.001.0078.491.002.111.002.321.003.861.001.051.00
Optimized R 18.981.0076.110.972.020.962.361.023.460.901.020.97
Optimized R + Netlib BLAS 18.560.9873.220.931.810.862.361.022.410.621.040.99
Optimized R + Standard ATLAS 19.431.0216.740.210.970.460.900.391.040.270.990.95
Optimized R + Optimized ATLAS 19.311.0216.360.210.950.450.840.361.020.261.000.95

# You may also like these posts:

1. Following on from my previous post about improving performance of R by linking with optimized linear algebra libraries , I thought it would be useful to try out the five benchmarks Revolutions Analytics have on their Revolutionary Performance pages.

2. A recent question on one of the LinkedIn groups about the advantages of using R over commercial tools like SAS or IBM SPSS Modeller drew lots of comments for R. We like R a lot and we use it extensively, but I also wanted to balance the discussion. R is great, but looking at commercial organizations near the end of 2011 it is not necessarily the right choice to make.

3. In this entry in a small series of tips for the use of the R statistical analysis and computing tool, we look at how to keep your addon packages up-to-date.

4. Somebody on the R-help mailing list asked how to get Rmpi working on his Fedora Linux machine so he could do high-performance computing on a cluster of machines (or a single multicore machine) using the R statistical computing and analysis platform . Sinc…