Using Github Actions & drat to Deploy R Packages
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Last summer, a GSOC project was approved for work on the hyperSpec
package which had grown quite large and hard to maintain.1 The essence of the project was to break the original hyperSpec
package into smaller packages.2 As part of that project, we needed to be able to:
- Provide development versions of packages
- Provide large data-only packages (potentially too large to be hosted on CRAN).
In this post I’ll describe how we used Dirk Eddelbuettel’s drat
package and Github
Actions to automate the deployment of packages between repositories.
What is drat?
drat
is a package that simplifies the creation and modification of CRAN-like repositories. The structure of a CRAN-like repository is officially described briefly here.3 Basically, there is required set of subdirectories, required files containing package metadata, and source packages that are the result of the usual build and check process. One can also have platform-specific binary packages. drat
will create the directories and metadata for you, and provides utilities that will move packages to the correct location and update the corresponding metadata.4 The link above provides access to all sorts of documentation. My advice is to not overthink the concept. A repository is simply a directory structure and a couple of required metadata files, which must be kept in sync with the packages present. drat
does the heavy-lifting for you.
What are Github Actions?
Github Actions are basically a series of tasks that one can have Github run when there is an “event” on a repo, like a push or pull. Github Actions are used extensively for continuous integrations tasks, but they are not limited to such use. Github Actions are written in a simply yaml-like script that is rather easy to follow even if the details are not familiar. Github Actions uses shell commands, but much of the time the shell simply calls Rscript
to run native R
functions. One can run tasks on various hardware and OS versions.
The Package Repo
The deployed packages reside on the gh-pages
branch of r-hyperspec/pkg-repo
in the form of the usual .tar.gz
source archives, ready for users to install. One of the important features of this repo is the table of hosted packages displayed in the README
. The table portion of README.md
file is generated automatically whenever someone, or something, pushes to this repo. I include the notion that something might push because as you will see next, the deploy process will automatically push archives to this repo from the repo where they are created. The details of how this README.md
is generated are in drat--update-readme.yaml
. If you take a look, you’ll see that we use some shell-scripting to find any .tar.gz
archives and create a markdown-ready table structure, which Github then automatically displays (as it does with all README.md
files at the top level of a repo). The yaml
file also contains a little drat
action that will refresh the repo in case that someone manually removes an archive file by git operations. Currently we do not host binary packages at this repo, but that is certainly possible by extension of the methods used for the source packages.
The Automatic Deploy Process
The automatic deploy process is used in several r-hyperSpec
repos. I’ll use the chondro
repo to illustrate the process. chondro
is a simple package containing a > 2 Mb data set. If the package is updated, the package is built and checked and then deployed automatically to r-hyperSpec/pkg-repo
(described above). The magic is in drat--insert-package.yaml
. The first part of this file does the standard build and check process.5 The second part takes care of deploying to r-hyperspec/pkg-repo
. The basic steps are given next (study the file for the details). It is essential to keep in mind that each task in Github Actions starts from the same top level directory.6 Tasks are set off by the syntax - name: task description
.
- Configure access to Github. Note that we employ a Github user name and e-mail that will uniquely identify the repo that is pushing to
r-hyperSpec/pkg-repo
. This is helpful for troubleshooting. - Clone
r-hyperSpec/pkg-repo
into a temporary directory and checkout the gh-pages branch. - Search for any
.tar.gz
files in thecheck
folder, which is where we directed Github Actions to carry out the build and check process (the first half of this workflow).7 Note that the argumentfull.names = TRUE
is essential to getting the correct path. Usedrat
to insert the.tar.gz
files into the clonedr-hyperSpec/pkg-repo
temporary directory. - Move to the temporary directory, then use git commands to send the updated
r-hyperspec/pkg-repo
branch back to its home, now with the new.tar.gz
files included. Use a git commit message that will show where the new tar ball came from.
Thanks for reading. Let me know if you have any questions, via the comments, e-mail, etc.
Acknowledgements
This portion of the hyperSpec
GSOC 2020 project was primarily the work of hyperSpec
team members Erick Oduniyi, Bryan Hanson and Vilmantas Gegzna. Erick was supported by GSOC in summer 2020.
The work continues this summer, hopefully again with the support of GSOC.↩︎
Project background and results.↩︎
A more loquacious description that may be slightly dated is here.↩︎
drat
is using existingR
functions, mainly from thetools
package. They are just organized and presented from the perspective of a user who wants to create a repo.↩︎The toughest part of writing this workflow was knowing where one was in the directory tree of the Github Actions workspace. We made liberal use of
getwd()
,list.files()
and related functions during troubleshooting. All of these “helps” have been removed from the mature version of the workflow. As noted in the workflow, the top directory is/home/runner/work/${{ REPOSITORY_NAME }}/${{ REPOSITORY_NAME }}
.↩︎It’s helpful to understand in a general way what happens during the build and check process (e.g. the directories and files created).↩︎
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.