Big changes behind the scenes in R 3.5.0

April 24, 2018
By

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

A major update to R is now available. The R Core group has announced the release of R 3.5.0, and binary versions for Windows and Linux are now available from the primary CRAN mirror. (The Mac release is forthcoming.)

Probably the biggest change in R 3.5.0 will be invisible to most users — except by the performance improvements it brings. The ALTREP project has now been rolled into R to use more efficient representations of many vectors, resulting in less memory usage and faster computations in many common situations. For example, the sequence vector 1:1000000 is now represented just by its start and end value, instead of allocating a vector of a million elements as earlier versions of R would do. So while R 3.4.3 takes about 1.5 seconds to run x <- 1:1e9 on my laptop, it's instantaneous in R 3.5.0.

There have been improvements in other areas too, thanks to ALTREP. The output of the sort function has a new representation: it includes a flag indicating that the vector is already sorted, so that sorting it again is instantaneous. As a result, running x <- sort(x) is now free the second and subsequent times you run it, unlike earlier versions of R. This may seem like a contrived example, but operations like this happen all the time in the internals of R code. Another good example is converting a numeric to a character vector: as.character(x) is now also instantaneous (the coercion to character is deferred until the character representation is actually needed). This has significant impact in R's statistical modelling functions, which carry around a long character vector that usually contains just numbers — the row names — with the design matrix. As a result, the calculation:

d <- data.frame(y = rnorm(1e7), x = 1:1e7)
lm(y ~ x, data=d)

runs about 4x faster on my system. (It also uses a lot less memory: running the equivalent command with 10x more rows failed for me in R 3.4.3 but succeeded in 3.5.0.)

The ALTREP system is designed to be extensible, but in R 3.5.0 the system is used exclusively for the internal operations of R. Nonetheless, if you'd like to get a sneak peek on how you might be able to use ALTREP yourself in future versions of R, you can take a look at this vignette (with the caveat that the interface may change when it's finally released).

There are many other improvements in R 3.5.0 beyond the ALTREP system, too. You can find the full details in the announcement, but here are a few highlights:

  • All packages are now byte-compiled on installation. R's base and recommended packages, and packages on CRAN, were already byte-compiled, so this will have the effect of improving the performance of packages installed from Github and from private sources.
  • R's performance is better when many packages are loaded, and more packages can be loaded at the same time on Windows (when packages use compiled code).
  • Improved support for long vectors, by functions including object.size, approx and spline
  • Reading in text data with readLines and scan should be faster, thanks to buffering on text connections.
  • R should handle some international data files better, with several bugs related to character encodings having been resolved.

Because R 3.5.0 is a major release, you will need to re-install any R packages you use. (The installr package can help with this.) On my reading of the release notes, there haven't been any major backwardly-incompatible changes, so your old scripts should continue to work. Nonetheless, given the significant changes behind the scenes, it might be best to wait for a maintenance release before using R 3.5.0 for production applications. But for developers and data science work, I recommend jumping over to R 3.5.0 right away, as the benefits are significant. 

You can find the details of what's new in R 3.5.0 at the link below. As always, many thanks go to the R Core team and the other volunteers who have contributed to the open source R project over the years.

R-announce mailing list: R 3.5.0 is released

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)