Speeding up R with Intel’s Math Kernel Library (MKL)

May 2, 2012

(This article was first published on PlanetFlux, and kindly contributed to R-bloggers)

I did some comparisons of the generic BLAS with Intel’s MKL (both sequential and parallel) on a Dell PowerEdge 610 server with dual hyperthreading 6-core 3.06GHz Xeon X5675 processors.  Here are the results from an R benchmarking script (Normal R indicates the generic BLAS,  sMKL is the sequential (single core Intel MKL, and pMKL is the parallel Intel MKL using all 24 threads available on this system).  Times are in seconds, lower is better.

R Benchmark 2.5


Number of times each test is run__________________________:  3

I. Matrix calculation

———————                                          Normal R   sMKL     pMKL

   Creation, transp., deformation of a 2500×2500 matrix (sec):   0.592    0.583    0.585

   2400×2400 normal distributed random matrix ^1000____ (sec):   0.425    0.411    0.427

   Sorting of 7,000,000 random values__________________ (sec):   0.787    0.778    0.777

   2800×2800 cross-product matrix (b = a’ * a)_________ (sec):  11.543    1.875    0.283

   Linear regr. over a 3000×3000 matrix (c = a \ b’)___ (sec):   5.367    0.910    0.214

                   Trimmed geom. mean (2 extremes eliminated):   1.358    0.743    0.414

II. Matrix functions


   FFT over 2,400,000 random values____________________ (sec):   0.422    0.451    0.435

   Eigenvalues of a 640×640 random matrix______________ (sec):   0.949    0.443    0.414

   Determinant of a 2500×2500 random matrix____________ (sec):   4.864    0.967    0.352

   Cholesky decomposition of a 3000×3000 matrix________ (sec):   4.131    0.865    0.179

   Inverse of a 1600×1600 random matrix________________ (sec):   4.011    0.751    0.277

                   Trimmed geom. mean (2 extremes eliminated):   2.505    0.667    0.343

III. Programmation


   3,500,000 Fibonacci numbers calculation (vector calc)(sec):   0.787    0.824    0.841

   Creation of a 3000×3000 Hilbert matrix (matrix calc) (sec):   0.456    0.465    0.431

   Grand common divisors of 400,000 pairs (recursion)__ (sec):   2.196    2.386    1.927

   Creation of a 500×500 Toeplitz matrix (loops)_______ (sec):   0.616    0.612    0.596

   Escoufier’s method on a 45×45 matrix (mixed)________ (sec):   0.470    0.425    0.447

                   Trimmed geom. mean (2 extremes eliminated):   0.611    0.617    0.607


Total time for all 15 tests_________________________    (sec):  37.62    12.76    8.18

Overall mean (sum of I, II and III trimmed means/3)_    (sec):  1.28    0.67    0.44

So you can see there are some significant gains, especially for the slowest tasks (see bolded items above).  For example, the parallel MKL resulted in a 40x speedup of the cross-product calculation.  Across all jobs the parallel MKL version was ~4.5x faster than the generic BLAS on this system.  When running models that take days to fit, that is significant!
Here’s how I installed it on Ubuntu 12.04:
  1. Download and install the Intel MKL from here.
  2. sudo apt-get install libreadline6 libreadline6-dev xserver-xorg xserver-xorg-dev gfortran-devel libxt-dev 
  3. Download the latest R, I used v2.15, available here
  4. Compile R from source using something like this (detailed instructions here):

    export MKL_LIB_PATH=/opt/intel/composer_xe_2011_sp1.7.256/mkl/lib/intel64
    export optim_flags="-O3 -march=native"
    # from http://cran.r-project.org/doc/manuals/R-admin.html#
    MKL="   -L${MKL_LIB_PATH} -Wl,--start-group ${MKL_LIB_PATH}/libmkl_gf_lp64.a ${MKL_LIB_PATH}/libmkl_gnu_thread.a ${MKL_LIB_PATH}/libmkl_core.a -Wl,--end-group -lgomp -lpthread"
    sudo ./configure --enable-R-shlib --with-blas="$MKL" CC="gcc" CFLAGS="$optim_flags" CXX="g++" CXXFLAGS="$optim_flags" F77="gfortran" FFLAGS="$optim_flags" FC="gfortran" FCFLAGS="$optim_flags"






To leave a comment for the author, please follow the link and comment on their blog: PlanetFlux.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Mango solutions

RStudio homepage

Zero Inflated Models and Generalized Linear Mixed Models with R

Dommino data lab

Quantide: statistical consulting and training




CRC R books series

Six Sigma Online Training

Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)