Edge cases in using the Intel MKL and parallel programming

[This article was first published on Revolutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

by Andrie de Vries

Recently we had a question on the public mailing list for Revolution R Open (RRO), on the topic of “MKL multithreaded library and mclapply do not play well together“.

If you're not familiar with these topics, here is a quick primer:

  • The Intel MKL is a fast, multi-threaded math library. We bundle the MKL with RRO.
  • The primary benefit of the MKL is that matrix algebra operations are much faster than using the math library that is bundled with R, e.g. more than 40x faster for matrix multiply.
  • The function mclapply() in the parallel package is similar to lapply() but runs in parallel on operating systems that support forking (e.g. Linux, but not Windows).

 Now, the question was posed as follows:

After some testing, I have discovered that using mclapply on multiple cores with MKLthreads set to greater than 1 results in the threads sleeping and basically never finishing. Obviously, the temporary solution is to set MKLthreads to 1. But it would be nice if these functions worked together, because you cannot always guarantee that a package in R will not use mclapply while calling a MKL threaded math function, and there are situations where I would like to just use MKLthreads > 1 and not worry about it.

Unpacking the question:

  1. The user is correctly using mclapply()
  2. He also knows how to control the number of threads used by the MKL, i.e. specifying setMklthreads() to the desired number
  3. The problem only occurs when setMklthreads() specifies more than 1 thread, e.g. setMklthreads(4).

To answer the question, I am going to refer to two some information about the MKL benchmarks at MRAN, as well as a vignette of the doParallel package.

Observation 1: Most of the benefit of the Intel MKL is from vectorised math, not multi-threading

To illustrate this, take a look at some of the performance characteristics we publish at MRAN:

Bench1.d70e631d[1]
Intel MKL performance benchmarks

 

From this plot you can see:

  • A big performance boost when using the MKL with just one thread
  • A marginal increase when using 4 threads, most notable in matrix multiply, and no benefit for singular value decomposition

Implication: if you want to only set a single value for the number of MKL threads, and never worry about code that does not run, use setMklthreads(1).

Observation 2: Parallel programming is hard and full of pitfalls

When you attempt to do parallel programming in R, you must be aware of the potential problems and pitfalls.  These pitfalls extend to much more than this example of using the MKL.

The vignette of the doParallel package makes this explicit warning in paragraph 2, “A word of caution”:

Because the parallel package in multicore mode starts its workers using fork without doing a subsequent exec, it has some limitations. Some operations cannot be performed properly by forked processes. For example, connection objects very likely won’t work. In some cases, this could cause an object to become corrupted, and the R session to crash.

Implication: Unfortunately there are no silver bullets in parallel programming. Take care when setting up your code, in particular if you make use of parallel paradigms that include forking, e.g. mclapply().

Some code.

I reproduce the code used in the original question below.  Notice that the last snippet will cause R to become unresponsive.  To avoid this, use setMklthreads(1).

 

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)