Using Github Actions and drat to Deploy R Packages

[This article was first published on Chemometrics and Spectroscopy Using R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Last summer, a GSOC project was approved for work on the hyperSpec package which had grown quite large and hard to maintain.1 The essence of the project was to break the original hyperSpec package into smaller packages.2 As part of that project, we needed to be able to:

  • Provide development versions of packages
  • Provide large data-only packages (potentially too large to be hosted on CRAN).

In this post I’ll describe how we used Dirk Eddelbuettel’s drat package and Github Actions to automate the deployment of packages between repositories.

What is drat?

drat is a package that simplifies the creation and modification of CRAN-like repositories. The structure of a CRAN-like repository is officially described briefly here.3 Basically, there is required set of subdirectories, required files containing package metadata, and source packages that are the result of the usual build and check process. One can also have platform-specific binary packages. drat will create the directories and metadata for you, and provides utilities that will move packages to the correct location and update the corresponding metadata.4 The link above provides access to all sorts of documentation. My advice is to not overthink the concept. A repository is simply a directory structure and a couple of required metadata files, which must be kept in sync with the packages present. drat does the heavy-lifting for you.

What are Github Actions?

Github Actions are basically a series of tasks that one can have Github run when there is an “event” on a repo, like a push or pull. Github Actions are used extensively for continuous integrations tasks, but they are not limited to such use. Github Actions are written in a simply yaml-like script that is rather easy to follow even if the details are not familiar. Github Actions uses shell commands, but much of the time the shell simply calls Rscript to run native R functions. One can run tasks on various hardware and OS versions.

The Package Repo

The deployed packages reside on the gh-pages branch of r-hyperspec/pkg-repo in the form of the usual .tar.gz source archives, ready for users to install. One of the important features of this repo is the table of hosted packages displayed in the README. The table portion of README.md file is generated automatically whenever someone, or something, pushes to this repo. I include the notion that something might push because as you will see next, the deploy process will automatically push archives to this repo from the repo where they are created. The details of how this README.md is generated are in drat--update-readme.yaml. If you take a look, you’ll see that we use some shell-scripting to find any .tar.gz archives and create a markdown-ready table structure, which Github then automatically displays (as it does with all README.md files at the top level of a repo). The yaml file also contains a little drat action that will refresh the repo in case that someone manually removes an archive file by git operations. Currently we do not host binary packages at this repo, but that is certainly possible by extension of the methods used for the source packages.

The Automatic Deploy Process

The automatic deploy process is used in several r-hyperSpec repos. I’ll use the chondro repo to illustrate the process. chondro is a simple package containing a > 2 Mb data set. If the package is updated, the package is built and checked and then deployed automatically to r-hyperSpec/pkg-repo (described above). The magic is in drat--insert-package.yaml. The first part of this file does the standard build and check process.5 The second part takes care of deploying to r-hyperspec/pkg-repo. The basic steps are given next (study the file for the details). It is essential to keep in mind that each task in Github Actions starts from the same top level directory.6 Tasks are set off by the syntax - name: task description.

  • Configure access to Github. Note that we employ a Github user name and e-mail that will uniquely identify the repo that is pushing to r-hyperSpec/pkg-repo. This is helpful for troubleshooting.
  • Clone r-hyperSpec/pkg-repo into a temporary directory and checkout the gh-pages branch.
  • Search for any .tar.gz files in the check folder, which is where we directed Github Actions to carry out the build and check process (the first half of this workflow).7 Note that the argument full.names = TRUE is essential to getting the correct path. Use drat to insert the .tar.gz files into the cloned r-hyperSpec/pkg-repo temporary directory.
  • Move to the temporary directory, then use git commands to send the updated r-hyperspec/pkg-repo branch back to its home, now with the new .tar.gz files included. Use a git commit message that will show where the new tar ball came from.

Thanks for reading. Let me know if you have any questions, via the comments, e-mail, etc.

Acknowledgements

This portion of the hyperSpec GSOC 2020 project was primarily the work of hyperSpec team members Erick Oduniyi, Bryan Hanson and Vilmantas Gegzna. Erick was supported by GSOC in summer 2020.

Footnotes

  1. The work continues this summer, hopefully again with the support of GSOC.↩︎

  2. Project background and results.↩︎

  3. A more loquacious description that may be slightly dated is here.↩︎

  4. drat is using existing R functions, mainly from the tools package. They are just organized and presented from the perspective of a user who wants to create a repo.↩︎

  5. Modified from the recipes here.↩︎

  6. The toughest part of writing this workflow was knowing where one was in the directory tree of the Github Actions workspace. We made liberal use of getwd(), list.files() and related functions during troubleshooting. All of these “helps” have been removed from the mature version of the workflow. As noted in the workflow, the top directory is /home/runner/work/${{ REPOSITORY_NAME }}/${{ REPOSITORY_NAME }}.↩︎

  7. It’s helpful to understand in a general way what happens during the build and check process (e.g. the directories and files created).↩︎

Reuse

Citation

BibTeX citation:
@online{hanson2021,
  author = {Bryan Hanson},
  title = {Using {Github} {Actions} and Drat to {Deploy} {R} {Packages}},
  date = {2021-04-11},
  url = {http://chemospec.org/posts/2021-04-11-GHA-drat/2021-04-11-GHA-drat.html},
  langid = {en}
}
For attribution, please cite this work as:
Bryan Hanson. 2021. “Using Github Actions and Drat to Deploy R Packages.” April 11, 2021. http://chemospec.org/posts/2021-04-11-GHA-drat/2021-04-11-GHA-drat.html.
To leave a comment for the author, please follow the link and comment on their blog: Chemometrics and Spectroscopy Using R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)