parallelsugar: An implementation of mclapply for Windows

October 14, 2015
By

(This article was first published on Nathan VanHoudnos » rstats, and kindly contributed to R-bloggers)

An easy way to run R code in parallel on a multicore system is with the mclapply() function. Unfortunately, mclapply() does not work on Windows machines because the mclapply() implementation relies on forking and Windows does not support forking.

Previously, I published a hackish solution that implemented a fake mclapply() for Windows users with one of the Windows compatible parallel R strategies. You can find further details here.

Due to positive user feedback, I have wrapped that script into a simple R package: parallelsugar.

Installation

Step 0: If you do not already have devtools installed, install it using the instructions here. Note that for the purposes of this package, installing Rtools is not necessary.

Step 1: Install parallelsugar directly from my GitHub repository using install_github('nathanvan/parallelsugar'). For the purposes of this package, you may ignore the error about Rtools (unless you already have it installed, in which case the warning will not appear.)

> library(devtools)
WARNING: Rtools is required to build R packages, but is not currently
installed.
   ... snip ...
> install_github('nathanvan/parallelsugar')
Downloading github repo nathanvan/[email protected]
Installing parallelsugar
  ... snip ...
* DONE (parallelsugar)

Usage examples

Basic Usage

On Windows, the following line will take about 40 seconds to run because by default, mclapply from the parallel package is implemented as a serial function on Windows systems.

library(parallel) 

system.time( mclapply(1:4, function(xx){ Sys.sleep(10) }) )
##    user  system elapsed 
##    0.00    0.00   40.06 

If we load parallelsugar, the default implementation of parallel::mclapply, which used fork based clusters, will be overwritten by parallelsugar::mclapply, which is implemented with socket clusters. The above line of code will then take closer to 10 seconds.

library(parallelsugar)
## 
## Attaching package: ‘parallelsugar’
## 
## The following object is masked from ‘package:parallel’:
## 
##     mclapply

system.time( mclapply(1:4, function(xx){ Sys.sleep(10) }) )
##    user  system elapsed 
##    0.04    0.08   12.98 

Use of global variables and packages

By design, parallelsugar approximates a fork based cluster — every object that is within scope to the master R process is copied over to the processes on the other sockets. This implies that

  • you can quickly run out of memory, and
  • you can waste a lot of time copying over unnecessary objects hanging
    around in your R session.

Be warned!

## Load a package 
library(Matrix)

## Define a global variable
a.global.variable <- Matrix::Diagonal(3)

## Define a global function 
wait.then.square <- function(xx){
  ## Wait for 5 seconds
  Sys.sleep(5);
  ## Square the argument
  xx^2 
}

## Check that it works with plain lapply
serial.output <- lapply( 1:4, function(xx) {
      return( wait.then.square(xx) + a.global.variable )
    }) 

## Test with the modified mclapply  
par.output <- mclapply( 1:4, function(xx) {
      return( wait.then.square(xx) + a.global.variable )
    })

## Are they equal? 
all.equal( serial.output, par.output )
## [1] TRUE

Request for feedback and help

I put this together because it helped to solve a specific problem that I was having. If it solves your problem, please let me know. If it needs to be modified to solve your problem, please either

To leave a comment for the author, please follow the link and comment on their blog: Nathan VanHoudnos » rstats.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)