(This article was first published on PlanetFlux, and kindly contributed to Rbloggers)
I did some comparisons of the generic BLAS with Intel’s MKL (both sequential and parallel) on a Dell PowerEdge 610 server with dual hyperthreading 6core 3.06GHz Xeon X5675 processors. Here are the results from an R benchmarking script (Normal R indicates the generic BLAS, sMKL is the sequential (single core Intel MKL, and pMKL is the parallel Intel MKL using all 24 threads available on this system). Times are in seconds, lower is better.
R Benchmark 2.5
===============
Number of times each test is run__________________________: 3
I. Matrix calculation
——————— Normal R sMKL pMKL
Creation, transp., deformation of a 2500×2500 matrix (sec): 0.592 0.583 0.585
2400×2400 normal distributed random matrix ^1000____ (sec): 0.425 0.411 0.427
Sorting of 7,000,000 random values__________________ (sec): 0.787 0.778 0.777
2800×2800 crossproduct matrix (b = a’ * a)_________ (sec): 11.543 1.875 0.283
Linear regr. over a 3000×3000 matrix (c = a \ b’)___ (sec): 5.367 0.910 0.214
Trimmed geom. mean (2 extremes eliminated): 1.358 0.743 0.414
II. Matrix functions
——————–
FFT over 2,400,000 random values____________________ (sec): 0.422 0.451 0.435
Eigenvalues of a 640×640 random matrix______________ (sec): 0.949 0.443 0.414
Determinant of a 2500×2500 random matrix____________ (sec): 4.864 0.967 0.352
Cholesky decomposition of a 3000×3000 matrix________ (sec): 4.131 0.865 0.179
Inverse of a 1600×1600 random matrix________________ (sec): 4.011 0.751 0.277
Trimmed geom. mean (2 extremes eliminated): 2.505 0.667 0.343
III. Programmation
——————
3,500,000 Fibonacci numbers calculation (vector calc)(sec): 0.787 0.824 0.841
Creation of a 3000×3000 Hilbert matrix (matrix calc) (sec): 0.456 0.465 0.431
Grand common divisors of 400,000 pairs (recursion)__ (sec): 2.196 2.386 1.927
Creation of a 500×500 Toeplitz matrix (loops)_______ (sec): 0.616 0.612 0.596
Escoufier’s method on a 45×45 matrix (mixed)________ (sec): 0.470 0.425 0.447
Trimmed geom. mean (2 extremes eliminated): 0.611 0.617 0.607
——————————————–
Total time for all 15 tests_________________________ (sec): 37.62 12.76 8.18
Overall mean (sum of I, II and III trimmed means/3)_ (sec): 1.28 0.67 0.44
So you can see there are some significant gains, especially for the slowest tasks (see bolded items above). For example, the parallel MKL resulted in a 40x speedup of the crossproduct calculation. Across all jobs the parallel MKL version was ~4.5x faster than the generic BLAS on this system. When running models that take days to fit, that is significant!
Here’s how I installed it on Ubuntu 12.04:
 Download and install the Intel MKL from here.

First install some libraries needed by R:sudo aptget install libreadline6 libreadline6dev xserverxorg xserverxorgdev gfortrandevel
libxtdev  Download the latest R, I used v2.15, available here
 Compile R from source using something like this (detailed instructions here):
export MKL_LIB_PATH=/opt/intel/composer_xe_2011_sp1.7.256/mkl/lib/intel64export optim_flags=”O3 funrollloops march=native”# from http://cran.rproject.org/doc/manuals/Radmin.html#MKL=” L${MKL_LIB_PATH} Wl,–startgroup ${MKL_LIB_PATH}/libmkl_gf_lp64.a \ ${MKL_LIB_PATH}/libmkl_gnu_thread.a ${MKL_LIB_PATH}/libmkl_core.a \ Wl,–endgroup lgomp lpthread”sudo ./configure –enableRshlib –withblas=”$MKL” CC=”gcc” CFLAGS=”$optim_flags” CXX=”g++” CXXFLAGS=”$optim_flags” F77=”gfortran” FFLAGS=”$optim_flags” FC=”gfortran” FCFLAGS=”$optim_flags”
To leave a comment for the author, please follow the link and comment on his blog: PlanetFlux.
Rbloggers.com offers daily email updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...