Developing R Packages with usethis and GitLab CI: Part II
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
This post, the second part in a series that covers R package development, will define the important concept of continuous integration (CI) and demonstrate the advantages of using CI within GitLab. The version control code repository, GitLab, offers many services to its users, including the ability to set up CI services to R programmers and software developers in private repositories for free. GitLab’s built-in CI service is easy to utilize and can be set up with an R package relatively quickly. This post will walk you through the steps you need to take to integrate the chifishr
R package created in part I with Gitlab CI.
Why GitLab?
Many popular R packages hosted on GitHub use Travis CI, a third party integration service. GitLab offers a built-in CI service, which is very similar to Travis, but comes with its own unique features and benefits. Although GitHub and Travis are well documented and currently have a larger R community following, GitLab and it’s built-in CI services offer several advantages which make them a worthy option when developing your R package.
GitLab community edition is fully open source, and allows for development within private repositories for free, a feature which GitHub currently does not offer. GitLab’s CI services are available for both public and private repositories, so if you are planning to create private, internal R packages for your team and/or clients to use, this is a major edge for GitLab. This is a primary reason for why we use GitLab to collaborate on packages at Methods.
Why CI?
Continuous integration is essential to ensuring that any update you make to your code does not break the functionality of your package. CI services compile your code on a per commit basis, allowing for automated building and testing of software. Instead of manually checking and re-running your tests on your local machine, a CI service automates that process, easing the burden on developers. This allows teams to work on a project or package together more efficiently and may help catch bugs in your code earlier on. See the prior post on setting up tests for functions in R packages.
CI services provide job logs that all package contributors can view. If a CI job fails, the user who pushed the commit is notified, and the job logs provide a place where you can view where the error occurred. Job logs are especially useful during pull/merge requests, as they give your co-contributors confidence that the changes you made have not broken any existing functions.
The steps to set up CI for an R package are relatively quick and easy, so they are definitely worth your time (and everyone else’s!) in the long run. We set up the example chifishr
R package in part 1 of this series, and will demonstrate sharing that package on GitLab in this post.
Adding a README
In order for other users to understand the general purpose of the chifishr
package, we will want to create a README.Rmd
file. This will demonstrate usage of the functions in the package and motivate when and why people should use it.
usethis::use_readme_rmd() #> ✔ Writing 'README.Rmd' #> ✔ Adding '^README\\.Rmd$' to '.Rbuildignore' #> ● Modify 'README.Rmd'
This function creates a draft README.Rmd
that outlines what should be included. We will need to edit the draft file so that it contains installation instructions and examples of the functions specific to our package. To see what I have included in the README
for this example, check here. We will also need to knit the README.Rmd
file, creating an additional README.md
file which will be rendered and displayed on GitLab.
Integrating with GitLab CI
To setup the CI, we will create a new configuration file, .gitlab-ci.yml
, in the root directory of the package. This file will contain the directions GitLab will use to run the checks and tests against each push to the remote repository. In this case, the goal of the configuration file is to build and test the package in a cloud environment that contains only the software and dependencies needed to accomplish the specified tasks. GitLab CI will utilize a Docker container and run a few simple R commands to check that everything works as intended. More information on the usage of the .gitlab-ci.yml
file can be found in GitLab’s documentation.
image: r-base test: script: - R -e 'install.packages(c("dplyr", "purrr", "testthat"))' - R CMD build . --no-build-vignettes --no-manual - R CMD check *tar.gz --no-build-vignettes --no-manual
First, we specify the Docker image as r-base
. Docker containers are portable and lightweight open source platforms for software development, and work great for continuous integration purposes. The r-base
image is maintained by the rocker
community of R + Docker users. It contains only the latest version of base R installed (no RStudio, tidyverse, etc.) and is the minimal container environment we need to run our jobs. We then run R commands to install the required dependencies, build our package, check that all of the proper documentation is in place, and verify that the function tests pass.
If any of the R commands fail, the CI job and pipeline will fail and the user who pushed the commit starting that job will be notified. For example, if we tried to use devtools::check()
here without installing devtools
into the environment beforehand, we would get an error that the package is not installed, and the CI job would fail. We would be able to see the error in GitLab’s user interface for that specific job in the CI/CD tab:
Note that although this .gitlab-ci.yml
file does everything we need to get the CI up and running, it could be improved and written more efficiently. Since devtools
is not included in the r-base
Docker image, we used the R CMD
shell commands here. We also installed our dependencies into the environment before building and checking the package, which drastically increases the amount of time needed to run the CI job. Parts III and IV of this series will dive deeper into the customization of the .gitlab-ci.yml
file to address these issues.
The .gitlab-ci.yml
file is for package development purposes only and therefore does not need to be included within the built package and available to the end-user. The .Rbuildignore
file is used similar to .gitignore
, in that it specifies which files contained in the directory should not be placed into the final package during the build process. We can add the .gitlab-ci.yml
file to .Rbuildignore
with usethis::use_build_ignore()
.
usethis::use_build_ignore(".gitlab-ci.yml") #> ✔ Adding '^\\.gitlab-ci\\.yml$' to '.Rbuildignore'
Up Next
Part III will expand into following more best practices using tools for developing R packages with GitLab CI, including:
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.