R-bloggers

Performance improvements coming to R 3.4.0

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

R 3.3.3 (codename: "Another Canoe") is scheduled for release on March 6. This is the "wrap-up" release of the R 3.3 series, which means it will include minor bug fixes and improvements, but eschew major new features. Major changes are coming though, with the subsequent release of R 3.4.0. While the NEWS file announcing updates in 3.4.0 is still subject to change, it indicates several major changes aimed at improving the performance of R in various ways:

A "just-in-time" JIT compiler will be included. While the core R packages have been byte-compiled since 2011, and package authors also have the option of btye-compiling the R code they contain, it was tricky for ordinary R users to gain the benefits of byte-compilation for their own code. In 3.4.0, loops in your R scripts and functions you write will be byte-compiled as you use them ("just-in-time"), so you can get improved performance for your R code without taking any additional actions.

Linear algebra performance improvements. R uses a BLAS library for high-performance implementations of many linear algebra routines like matrix multiplication, and now R will use faster routines in some situations (e.g. for matrix-vector multiplications). It will also be slightly faster for each call, by reducing the time to check whether the data include missing values (which BLAS generally doesn't handle). This should improve the performance of all R distributions, including those like Microsoft R that are bundled with multi-threaded BLAS libraries.

Improvements for packages with compiled code. Many packages include code written in C or C++ (or even Fortran, still a powerful language for scientific computing) that is then called from R functions. R 3.4.0 will include a new system that allows package developers to choose to expose compiled functions to other packages or to keep them private. As a side benefit, this new "registration" system will speed up the process of calling compiled functions, particularly on Windows systems. The gain is measured in the order of microseconds per call, but when these functions are called thousands or millions of times the impact can be noticable. The system also adds additional checks to make sure calls to compiled functions are structured correctly — a reliability check that has already detected potential bugs in dozens of packages already on CRAN.

Accumulating vectors in a loop is faster. It's still a bad idea to extend the length of a vector with each iteration of a loop (it's a better idea to pre-allocate a vector of the needed length first), but code that follows that practice should now run faster thanks to R occasionally grabbing a bit more memory than needed. 

Performance improvements to other functions. Sorting vectors of numbers is faster (thanks to the use of the radix-sort algorithm by default). Tables with missing values compute quicker. Long strings no longer cause slowness in the str function. The sapply function is faster when applied to arrays with dimension names.

There are several other improvements not related to performance, as well:

  • An updated version of the Tcl/Tk graphics system in R for Windows.
  • More consistent handling of missing values when constructing tables.
  • Accuracy improvements for extreme values in some statistical functions.
  • Better detection and warning of likely programmer errors, like comparing a vector with a zero-length array.

No release date has been provided for 3.4.0 provided by the R Core Group yet, but according to the R Developers page it's likely to be available in mid-April. (That's not a guarantee though: issues with the wrap-up release have delayed major updates in the past.) But whenever it's available, R 3.4.0 looks to be a significant improvement for R users, especially those that care about performance.

 

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...