Developing R Packages with usethis and GitLab CI: Part II

[This article was first published on Rstats on pi: predict/infer, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This post, the second part in a series that covers R package development, will define the important concept of continuous integration (CI) and demonstrate the advantages of using CI within GitLab. The version control code repository, GitLab, offers many services to its users, including the ability to set up CI services to R programmers and software developers in private repositories for free. GitLab’s built-in CI service is easy to utilize and can be set up with an R package relatively quickly. This post will walk you through the steps you need to take to integrate the chifishr R package created in part I with Gitlab CI.

Why GitLab?

Many popular R packages hosted on GitHub use Travis CI, a third party integration service. GitLab offers a built-in CI service, which is very similar to Travis, but comes with its own unique features and benefits. Although GitHub and Travis are well documented and currently have a larger R community following, GitLab and it’s built-in CI services offer several advantages which make them a worthy option when developing your R package.

GitLab community edition is fully open source, and allows for development within private repositories for free, a feature which GitHub currently does not offer. GitLab’s CI services are available for both public and private repositories, so if you are planning to create private, internal R packages for your team and/or clients to use, this is a major edge for GitLab. This is a primary reason for why we use GitLab to collaborate on packages at Methods.

Why CI?

Continuous integration is essential to ensuring that any update you make to your code does not break the functionality of your package. CI services compile your code on a per commit basis, allowing for automated building and testing of software. Instead of manually checking and re-running your tests on your local machine, a CI service automates that process, easing the burden on developers. This allows teams to work on a project or package together more efficiently and may help catch bugs in your code earlier on. See the prior post on setting up tests for functions in R packages.

CI services provide job logs that all package contributors can view. If a CI job fails, the user who pushed the commit is notified, and the job logs provide a place where you can view where the error occurred. Job logs are especially useful during pull/merge requests, as they give your co-contributors confidence that the changes you made have not broken any existing functions.

The steps to set up CI for an R package are relatively quick and easy, so they are definitely worth your time (and everyone else’s!) in the long run. We set up the example chifishr R package in part 1 of this series, and will demonstrate sharing that package on GitLab in this post.

Adding a README

In order for other users to understand the general purpose of the chifishr package, we will want to create a README.Rmd file. This will demonstrate usage of the functions in the package and motivate when and why people should use it.

#> ✔ Writing 'README.Rmd'
#> ✔ Adding '^README\\.Rmd$' to '.Rbuildignore'
#> ● Modify 'README.Rmd'

This function creates a draft README.Rmd that outlines what should be included. We will need to edit the draft file so that it contains installation instructions and examples of the functions specific to our package. To see what I have included in the README for this example, check here. We will also need to knit the README.Rmd file, creating an additional file which will be rendered and displayed on GitLab.

Integrating with GitLab CI

To setup the CI, we will create a new configuration file, .gitlab-ci.yml, in the root directory of the package. This file will contain the directions GitLab will use to run the checks and tests against each push to the remote repository. In this case, the goal of the configuration file is to build and test the package in a cloud environment that contains only the software and dependencies needed to accomplish the specified tasks. GitLab CI will utilize a Docker container and run a few simple R commands to check that everything works as intended. More information on the usage of the .gitlab-ci.yml file can be found in GitLab’s documentation.

image: r-base

    - R -e 'install.packages(c("dplyr", "purrr", "testthat"))'
    - R CMD build . --no-build-vignettes --no-manual
    - R CMD check *tar.gz --no-build-vignettes --no-manual

First, we specify the Docker image as r-base. Docker containers are portable and lightweight open source platforms for software development, and work great for continuous integration purposes. The r-base image is maintained by the rocker community of R + Docker users. It contains only the latest version of base R installed (no RStudio, tidyverse, etc.) and is the minimal container environment we need to run our jobs. We then run R commands to install the required dependencies, build our package, check that all of the proper documentation is in place, and verify that the function tests pass.

If any of the R commands fail, the CI job and pipeline will fail and the user who pushed the commit starting that job will be notified. For example, if we tried to use devtools::check() here without installing devtools into the environment beforehand, we would get an error that the package is not installed, and the CI job would fail. We would be able to see the error in GitLab’s user interface for that specific job in the CI/CD tab:

Note that although this .gitlab-ci.yml file does everything we need to get the CI up and running, it could be improved and written more efficiently. Since devtools is not included in the r-base Docker image, we used the R CMD shell commands here. We also installed our dependencies into the environment before building and checking the package, which drastically increases the amount of time needed to run the CI job. Parts III and IV of this series will dive deeper into the customization of the .gitlab-ci.yml file to address these issues.

The .gitlab-ci.yml file is for package development purposes only and therefore does not need to be included within the built package and available to the end-user. The .Rbuildignore file is used similar to .gitignore, in that it specifies which files contained in the directory should not be placed into the final package during the build process. We can add the .gitlab-ci.yml file to .Rbuildignore with usethis::use_build_ignore().

#> ✔ Adding '^\\.gitlab-ci\\.yml$' to '.Rbuildignore'

Sharing the package

Now we are ready to share the package, so we can use usethis::use_git to make the package a git repository as well as add and commit the files.

#> ✔ Initialising Git repo
#> ✔ Adding '.Rhistory', '.RData' to './.gitignore'
#> ✔ Adding files and committing

Then push to GitLab via the command line or RStudio’s Terminal pane. Note: If you do not yet have a GitLab account, you’ll need to create one prior to this step and replace the username with yours in the project path.

git push --set-upstream master

Check the CI/CD tab on GitLab to view the output of the continuous integration orchestrated by .gitlab-ci.yml.

If you have followed the above steps, you should see a green check mark indicating that everything has passed. By default, the project visibility is private. If you would like to make your package accessible to the public, go to the project’s General Settings tab and change the visibility permissions to public:

Instructions on how to install the package as well as the entire example repository created in this post are available here.

Up Next

Part III will expand into following more best practices using tools for developing R packages with GitLab CI, including:

  • customization of .gitlab-ci.yml
  • lintr for code linting
  • styler for style checking
  • covr for code coverage

To leave a comment for the author, please follow the link and comment on their blog: Rstats on pi: predict/infer. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)