Before I share my results with you, let me make a couple of comments.
First, parallel processing involves some costs in terms of communication overheads, so not all tasks are well-suited to this type of processing. It’s easy to generate examples that are computationally intensive, but execute faster on a single processor than on a cluster (of cores, or machines).
Second, even when a task is suitable for parallel processing, don’t expect the reduction in elapsed time to be linearly related to the increase in the number of cores. Remember, there are overheads involved!
Recently, there have been some posts out there that have illustrated the advantages of parallel processing in R. For example, WenSui Liu posted a piece describing some experiments run using the Ubuntu O/S. Also, Daniel Marcelino had a post that compared various “parallel” packages in R on a MacBook Pro. Nice choice of machine – it’s running UNIX beneath that pretty cover! And then, just as I was writing this post today, Arthur Charpentier came out with this related post, also based on results using a Mac.
However, none of these posts deal with a Windows environment, or the sorts of Monte Carlo or bootstrap simulations that econometricians use all of the time. So, I felt that there was something more to explore.
The first thing that I discovered, after a lot of digging around, is that although there’s a number of R packages to help with parallel processing, if you’re running Windows then your options are limited. O.K., that’s no surprise, of course! Don’t write comments saying that I should be using a different O/S if I want to engage in fast computing. I know that!
However, let’s stick with Windows. In that case it seems that the snowfall package for R is the best choice, currently. That’s what the results below are based on.
This test involves bootstrapping the sampling distribution of an OLS estimator. Of course, we know the answer – this is just an illustration of processing times!
This test involves a Monte Carlo simulation of the power of a paired t-test, using 1,999 replications, and sample sizes of n = 10 (5) 200. Again, the R script is on the code page for this blog, and it’s a modified version of an example given by Spector (undated)
Knaus, J., C. Porzelius, H. Binder, & G. Schwarzer, 2009. Easier parallel computing in R with snowfall and sfCluster. The R Journal, 1/1, 54-59.
Spector, P., undated. Using the snowfall library in R. Mimeo., Statistical Computing Facility, Department of Statistics, University of California, Berkeley.