by Andrie de Vries
Recently we had a question on the public mailing list for Revolution R Open (RRO), on the topic of “MKL multithreaded library and mclapply do not play well together“.
If you're not familiar with these topics, here is a quick primer:
- The Intel MKL is a fast, multi-threaded math library. We bundle the MKL with RRO.
- The primary benefit of the MKL is that matrix algebra operations are much faster than using the math library that is bundled with R, e.g. more than 40x faster for matrix multiply.
- The function mclapply() in the parallel package is similar to lapply() but runs in parallel on operating systems that support forking (e.g. Linux, but not Windows).
Now, the question was posed as follows:
After some testing, I have discovered that using mclapply on multiple cores with MKLthreads set to greater than 1 results in the threads sleeping and basically never finishing. Obviously, the temporary solution is to set MKLthreads to 1. But it would be nice if these functions worked together, because you cannot always guarantee that a package in R will not use mclapply while calling a MKL threaded math function, and there are situations where I would like to just use MKLthreads > 1 and not worry about it.
Unpacking the question:
- The user is correctly using mclapply()
- He also knows how to control the number of threads used by the MKL, i.e. specifying setMklthreads() to the desired number
- The problem only occurs when setMklthreads() specifies more than 1 thread, e.g. setMklthreads(4).
To answer the question, I am going to refer to two some information about the MKL benchmarks at MRAN, as well as a vignette of the doParallel package.
Observation 1: Most of the benefit of the Intel MKL is from vectorised math, not multi-threading
To illustrate this, take a look at some of the performance characteristics we publish at MRAN:
From this plot you can see:
- A big performance boost when using the MKL with just one thread
- A marginal increase when using 4 threads, most notable in matrix multiply, and no benefit for singular value decomposition
Implication: if you want to only set a single value for the number of MKL threads, and never worry about code that does not run, use setMklthreads(1).
Observation 2: Parallel programming is hard and full of pitfalls
When you attempt to do parallel programming in R, you must be aware of the potential problems and pitfalls. These pitfalls extend to much more than this example of using the MKL.
Because the parallel package in multicore mode starts its workers using fork without doing a subsequent exec, it has some limitations. Some operations cannot be performed properly by forked processes. For example, connection objects very likely won’t work. In some cases, this could cause an object to become corrupted, and the R session to crash.
Implication: Unfortunately there are no silver bullets in parallel programming. Take care when setting up your code, in particular if you make use of parallel paradigms that include forking, e.g. mclapply().
I reproduce the code used in the original question below. Notice that the last snippet will cause R to become unresponsive. To avoid this, use setMklthreads(1).