Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

If you have ever developed or used an open-source R package, you’re likely familiar with continuous integration. By automating the process of testing each proposed change in the source code, you can reduce the risk of errors, avoid unnecessary overhead and increase the quality of developed solution. For data scientists, Hadley has a good description of why it’s worth using in R.

The most popular CI solution in the R world is TravisCI. Overall it works great, has built-in community support for R and is free for any open source project. CircleCI offers a great alternative with a free plan that includes private repositories. This is a perfect solution if you’re building a package that can not be released publicly and you don’t have a paid Travis account. This post will quickly take you through setting up continuous integration for your private R project with CircleCI.

Even if you are not developing an R package, but simply working on a data science project for a client, you can still use the same approach to run your unit tests and linter each time you push a commit to repo.

At Appsilon Data Science we strive to have tests and continuous integration for every project to ensure we catch all errors as quickly as possible. Obviously, even source code for most commercial projects has to be private (not mentioning training data, which we don’t send to CI anyway). CircleCI was our first choice for continuous integration, and we were not disappointed. We even started using it for our open source packages. If you’re interested in these packages, we’ve written posts about shiny.semantic and shiny.router in the past.

# Continuous integration for all your private R projects

You can also use CircleCI with Bitbucket, but not yet with Gitlab. Our goal will be to set up continuous integration for a private R project hosted on GitHub. We’ll be assuming that it is a package and that we already have some unit tests. We want CircleCI to install all the necessary dependencies and run all tests in an isolated environment every time we push a commit to the repository. We’ll be using the free CircleCI plan for that. You can always graduate to a paid plan on CircleCI once your project needs that.

You can use virtually the same steps if the project is not an R package, you’ll just need to adjust how the project is tested.

# Configuring project for CircleCI

For CircleCI to know how to handle our project, we need to add a configuration file to the project repo. There are two versions of CircleCI API you can use for that, and we’ll be using the newer Version 2.

CircleCI supports running tests in Docker images, which is great for cleanly managing system-level dependencies. We’ll be using a Docker image that contains all system-level dependencies of our project plus everything that we need for building it. If you don’t need any additional system libraries, you can use the image built by us. Its name is appsilon/ci-base:1.0. You will need to modify the Docker image if your package or its dependencies require some libraries to be installed in the system. We will show you how to do that at the end of this article.

All R packages that your package needs will be installed automatically based on the package’s DESCRIPTION – no need to add them to the Docker image.

Let’s put the following contents in .circleci/config.yml:

version: 2
jobs:
build:
working_directory: ~/main
docker:
- image: appsilon/ci-base:1.0
steps:
- checkout
- run:
command: |
R -e 'devtools::install_deps(dependencies = TRUE)'
- run:
command: |
R -e 'devtools::check()'
- store_artifacts:
path: man/
destination: man


We start by specifying the CircleCI API version and working directory. Then we declare containers to run tests on by providing a Docker image. You can add additional containers here, e.g. with a database if you have tests that need to talk to external resources (of course these would not be unit tests, but higher levels of test pyramid are also valuable.)

Next, we define steps that should be run each time we’re running tests. We check out code from the repo, install package dependencies (as defined in DESCRIPTION) using devtools, and then perform a devtools::check of the package. This is all we need to do because check runs all of the tests. If your project is not a package, this will look similar, but you will need to change how dependencies are installed and how tests are run (perhaps this is a topic for another short article).

Finally, we store generated files in man/ as artifacts to be able to download them later.

## Caching R packages library

The above setup is all that we need. If package check or any other step fails, CircleCI will automatically report this as an error. However, to save our time (and computing power), it’s worth doing one more thing: caching installed R packages. This way we won’t waste time installing all packages from scratch each time tests run.

To cache the R packages library, let’s add a restore_cache and save_cache steps around dependencies installation:

version: 2
jobs:
build:
working_directory: ~/main
docker:
- image: appsilon/ci-base:1.0
steps:
- checkout
- restore_cache:
keys:
- deps1-{{ .Branch }}-{{ checksum "DESCRIPTION" }}
- deps1-{{ .Branch }}
- deps1-
- run:
command: |
R -e 'devtools::install_deps(dependencies = TRUE)'
- save_cache:
key: deps1-{{ .Branch }}-{{ checksum "DESCRIPTION" }}
paths:
- "/usr/local/lib/R/site-library"
- run:
command: |
R -e 'devtools::check()'
- store_artifacts:
path: man/
destination: man


You can read more about caching in CircleCI docs. I recommend experimenting with cache keys to best fit your scenario.

# Create project on CircleCI

Our project is ready! The final step is to go circleci.com, log in with your Github account, go to Projects and choose “Setup project” for your project’s repo. This will configure all needed keys on Github and start monitoring changes, so you need to be an admin of that repo to do that. By default, CircleCI will build all new commits, whether they are in a PR or not. You can customize that if you need to. One setting that I particularly recommend is to cancel running builds of a branch if new commits are pushed to this branch. This can save quite some time if you’re iterating quickly on a branch.

# Using custom Docker image for running tests

Preparing a Docker image to run our tests is simple as we can use the popular rocker images as a base. Here’s the Dockerfile we used for building appsilon/ci-base:1.0:

FROM r-base:3.4.1

RUN apt-get update  \
&& apt-get install git libssl-dev ssh texlive-latex-base texlive-fonts-recommended libcurl4-openssl-dev libxml2-dev -y \
&& rm -rf /var/lib/apt/lists/*

RUN R -e "install.packages(c('devtools', 'roxygen2'), repos='http://cran.us.r-project.org')"


As you can see, on top r-base image, we install several libraries that are required to build an R package. If you need additional libraries for your package, you can add them here as well. We also install two R packages – devtools and roxygen2 that are not dependencies of our package, but we need them to build it.

To use your image on CircleCI, you’ll need to build it and push to Dockerhub, which is where CircleCI looks for images. Just run these commands in your Dockerfile directory, replacing appsilon with your Dockerhub account (you’ll need to docker login if you haven’t done this before):

docker build -t appsilon/ci-base:1.0 .
docker push appsilon/ci-base:1.0


# Seeing the results

Each time you push a new commit, CircleCI will check the package and report the results back to Github. They will be shown in commits and related PRs. You can also add this single line in project’s README.md to display a badge with build status of main branch:

[![CircleCI](https://circleci.com/gh/Appsilon/ci.example.svg?style=svg)](https://circleci.com/gh/Appsilon/ci.example)


I hope this is helpful! Also, you can find a sample project with all configuration (including code linter) on GitHub.

Read the original post at Appsilon Data Science Blog.