Speeding up your Continuous Integration Builds

[This article was first published on r – Jumping Rivers, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Continuous integration is an amazing tool when developing R packages. We push a change to the server, and a process is spawned that checks we haven’t done something silly. It protects us from ourselves! However this process can become slow, as typically the CI process starts with a blank virtual machine (VM).

If you are using R, then the current most popular CI pipeline is Travis CI, but there’s also Jenkins, GitHub Actions, GitLab CI, Circle CI and a few others. They all follow the same idea. Start a VM, install your R package, then run a bunch of checks. One obvious bottle neck is the “install your R package” step, as any R package may have a large number of dependencies.

In a recent post, we showed the different ways of speeding up package installation (worth checking this out if you find package installation/updating slow). In this post, we’ll discuss leveraging some of those techniques for our CI pipeline.

RStudio Package Manager (RSPM)

The RStudio package manager is perhaps the easiest way of speeding up your CI process. RSPM provides precompiled binaries for CRAN packages, which should ensure a faster install. To test this I made a simple package, with no functions, but a dependency on the tidyverse, .i.e. Imports: tidyverse in the DESCRIPTION file. Then I started two travis CI jobs. The first had a .travis.yml file

language: r
cache: packages

The total time for this travis job was around twelve minutes.

The second job had same two lines, but also an additional before_install: line

  - echo "options(repos = c(CRAN = 'https://packagemanager.rstudio.com/all/__linux__/xenial/latest'))" >> ~/.Rprofile.site
  - echo "options(HTTPUserAgent = paste0('R/', getRversion(), ' R (',
       paste(getRversion(), R.version['platform'], R.version['arch'], R.version['os']),
       ')'))" >> ~/.Rprofile.site

While looking complicated, it is actually fairly simple. The first line adds the RStudio binary package repository to the .Rprofile. The second adds an HTTPUserAgent to the .Rprofile to enable packages that are installed via Rscript to also use the binary package versions. These few lines cut the travis build time from around 12 minutes to under 4 minutes.

The above is an incredibly easy way to speed-up your CI steps and works with other CI systems. If you use GitHub Actions, then this has already been implemented.

A couple of things to note

  • The above code is for Ubuntu 16.04 Xenial. If you are using 18.04 bionic, then change in the obvious way
  • There are few different OSs available for RSPM
  • If you are interested in using the RSPM in your own organisation, give us a shout – we’re RStudio Partners.

Other methods

There are three other possibilities for reducing your CI time.

  1. The first is similar to the RStudio package manager and use binary builds, but this time use the Ubuntu versions provided by Michael Rutter. The general idea is to add a new Ubuntu package repository, then install packages via apt install r-cran-*. Details are available at CRAN. Also see Dirk Eddelbuettel’s recent blog post and youtube video for even more details.

  2. Alternatively, we could use the ccache trick, where we store compiled files to be used for the next build. This requires a little more work, but this has already been done by Patrick Schratz

  3. Parallel builds using the Ncpus argument with install.packages() typically doesn’t typically work for most CI systems, as the (free) VM will only have a single core.

Jumping Rivers are full service, RStudio certified. Part of our role is to offer support in RStudio Pro products. If you use any RStudio Pro products, feel free to contact us (). We may be able to offer free support.

The post Speeding up your Continuous Integration Builds appeared first on Jumping Rivers.

To leave a comment for the author, please follow the link and comment on their blog: r – Jumping Rivers.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)