Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

An easy way to run R code in parallel on a multicore system is with the mclapply() function. Unfortunately, mclapply() does not work on Windows machines because the mclapply() implementation relies on forking and Windows does not support forking.

Previously, I published a hackish solution that implemented a fake mclapply() for Windows users with one of the Windows compatible parallel R strategies. You can find further details here.

Due to positive user feedback, I have wrapped that script into a simple R package: parallelsugar.

## Installation

Step 0: If you do not already have devtools installed, install it using the instructions here. Note that for the purposes of this package, installing Rtools is not necessary.

Step 1: Install parallelsugar directly from my GitHub repository using install_github('nathanvan/parallelsugar'). For the purposes of this package, you may ignore the error about Rtools (unless you already have it installed, in which case the warning will not appear.)

> library(devtools)
WARNING: Rtools is required to build R packages, but is not currently
installed.
... snip ...
> install_github('nathanvan/parallelsugar')
Installing parallelsugar
... snip ...
* DONE (parallelsugar)


## Usage examples

### Basic Usage

On Windows, the following line will take about 40 seconds to run because by default, mclapply from the parallel package is implemented as a serial function on Windows systems.

library(parallel)

system.time( mclapply(1:4, function(xx){ Sys.sleep(10) }) )
##    user  system elapsed
##    0.00    0.00   40.06


If we load parallelsugar, the default implementation of parallel::mclapply, which used fork based clusters, will be overwritten by parallelsugar::mclapply, which is implemented with socket clusters. The above line of code will then take closer to 10 seconds.

library(parallelsugar)
##
## Attaching package: ‘parallelsugar’
##
## The following object is masked from ‘package:parallel’:
##
##     mclapply

system.time( mclapply(1:4, function(xx){ Sys.sleep(10) }) )
##    user  system elapsed
##    0.04    0.08   12.98


### Use of global variables and packages

By design, parallelsugar approximates a fork based cluster — every object that is within scope to the master R process is copied over to the processes on the other sockets. This implies that

• you can quickly run out of memory, and
• you can waste a lot of time copying over unnecessary objects hanging

Be warned!

## Load a package
library(Matrix)

## Define a global variable
a.global.variable <- Matrix::Diagonal(3)

## Define a global function
wait.then.square <- function(xx){
## Wait for 5 seconds
Sys.sleep(5);
## Square the argument
xx^2
}

## Check that it works with plain lapply
serial.output <- lapply( 1:4, function(xx) {
return( wait.then.square(xx) + a.global.variable )
})

## Test with the modified mclapply
par.output <- mclapply( 1:4, function(xx) {
return( wait.then.square(xx) + a.global.variable )
})

## Are they equal?
all.equal( serial.output, par.output )
## [1] TRUE


## Request for feedback and help

I put this together because it helped to solve a specific problem that I was having. If it solves your problem, please let me know. If it needs to be modified to solve your problem, please either