New versions of the following future backends are available on CRAN:
- future.callr – parallelization via callr, i.e. on the local machine
- future.batchtools – parallelization via batchtools, i.e. on a compute cluster with job schedulers (SLURM, SGE, Torque/PBS, etc.) but also on the local machine
- future.BatchJobs – (maintained for legacy reasons) parallelization via BatchJobs, which is the predecessor of batchtools
I also released a new version of:
- doFuture – use any future backend for
which comes with a few improvements and bug fixes.
The future is now.
The future is … what?
If you never heard of the future framework before, here is a simple example. Assume that you want to run
y <- lapply(X, FUN = my_slow_function)
in parallel on your local computer. The most straightforward way to achieve this is to use:
library(future.apply) plan(multiprocess) y <- future_lapply(X, FUN = my_slow_function)
If you have SSH access to a few machines here and there with R installed, you can use:
library(future.apply) plan(cluster, workers = c("localhost", "gandalf.remote.edu", "server.cloud.org")) y <- future_lapply(X, FUN = my_slow_function)
Even better, if you have access to compute cluster with an SGE job scheduler, you could use:
library(future.apply) plan(future.batchtools::batchtools_sge) y <- future_lapply(X, FUN = my_slow_function)
The future is … why?
The future package provides a simple, cross-platform, and lightweight API for parallel processing in R. At its core, there are three core building blocks for doing parallel processing -
value()- which are used for creating the asynchronous evaluation of an R expression, querying whether it’s done or not, and collecting the results. With these fundamental building blocks, a large variety of parallel tasks can be performed, either by using these functions directly or indirectly via more feature rich higher-level parallelization APIs such as future.apply, foreach, BiocParallel or plyr with doFuture, and furrr. In all cases, how and where future R expressions are evaluated, that is, how and where the parallelization is performed, depends solely on which future backend is currently used, which is controlled by the
One advantage of the Future API, whether it is used directly as is or via one of the higher-level APIs, is that it encapsulates the details on how and where the code is parallelized allowing the developer to instead focus on what to parallelize. Another advantage is that the end user will have control over which future backend to use. For instance, one user may choose to run an analysis in parallel on their notebook or in the cloud, whereas another may want to run it via a job scheduler in a high-performance compute (HPC) environment.
I’ve spent a fair bit of time working on future.tests, which is a single framework for testing future backends. It will allow developers of future backends to validate that they fully conform to the Future API. This will lower the barrier for creating a new backend (e.g. future.clustermq on top of clustermq or one on top Redis) and it will add trust for existing ones such that end users can reliably switch between backends without having to worry about the results being different or even corrupted.
So, backed by future.tests, I feel more comfortable attacking some of the feature requests - and there are quite a few of them. Indeed, I’ve already implemented one of them. More news coming soon …
- future 1.9.0 - Output from The Future, 2018-07-23
- future.apply - Parallelize Any Base R Apply Function, 2018-06-23
- Delayed Future(Slides from eRum 2018), 2018-06-19
- future 1.8.0: Preparing for a Shiny Future, 2018-04-12
- The Many-Faced Future, 2017-06-05
- future 1.3.0 Reproducible RNGs, future_lapply() and More, 2017-02-19
- High-Performance Compute in R Using Futures, 2016-10-22
- Remote Processing Using Futures, 2016-10-11
- A Future for R: Slides from useR 2016, 2016-07-02