Site icon R-bloggers

Running an R Script on a Schedule: Gh-Actions

[This article was first published on Category R on Roel's R-tefacts, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
< !-- useful settings for rmarkdown--> < !-- content -->

In this tutorial I have an R script that runs every day on github actions. It creates a curve in ggplot2 and posts that picture to twitter.

The use case is this: You have a script and it needs to run on a schedule (for instance every day).

Other ways to schedule a script

I will create a new post for many of the other ways on which you can run an R script on schedule. But in this case I will run the script on github actions. Github actions is completely scriptable and works quite well. I found it quite difficult to make it work in one go. Find all posts about scheduling an R script here

Github actions details

Github actions is a new product from github, you get 2000 minutes per account/month. You create github actions by writing down the steps in a yaml file in a specific place in your repo: .github/workflows/*. Yaml files are sensitive to indentation so try a yaml validator before you push the data to github.

So what are we going to do?

(We want the code to run on computer in the cloud)
You save your script locally in a git repository
You push everything to github
# installation
the github action
- has R installed
- installs the system dependencies
- and installs the correct packages
# running something
github actions runs the script
we can schedule this action

I first explain what you need, what my rscript does, and how to deal with credentials. If you are not interested go immediately to steps.

What you need:

Example of a script

I have an R script that:

With this as result:

Of course you could create something that is actually useful, like downloading data, cleaning it and pushing it into a database. But this example is relatively small and you can actually see the results online.

Small diversion: credentials/ secrets

For many applications you need credentials and you don’t want to put the credentials in the script, if you share the script with someone, they also have the credentials. If you put it on github, the world has your secrets (I just did this).

So how can you do it? R can read environmental variables and in github you can input the environmental variables that will be passed to the runner when it runs (there are better, more professional tools to do the same thing but this is good enough for me). So you create an environmental variable called apikey with a value like aVerY5eCretKEy. In your script you use Sys.getenv("apikey") and the script will retrieve the apikey: aVerY5eCretKEy and use that.

How do you add them to your local environment?

How do you add them to github?

they will be retrieved from the stores with something like ${{ secrets.APIKEY}}

Steps

So what do you need to make this work?

Steps in order

Check if your script runs on your computer
Set up renv and snapshot
(optional) try a cache of your renv libraries for faster
install the correct packages on the runner
execute the script

Steps with explanation

env:
RENV_PATHS_ROOT: ~/.local/share/renv
steps:
- name: Cache packages
uses: actions/cache@v1
with:
path: ${{ env.RENV_PATHS_ROOT }}
key: ${{ runner.os }}-renv-${{ hashFiles('**/renv.lock') }}
restore-keys: |
${{ runner.os }}-renv-
- name: Restore packages
shell: Rscript {0}
run: |
if (!requireNamespace("renv", quietly = TRUE)) install.packages("renv")
renv::restore()

- run: R -e 'renv::restore()'

- run: Rscript run_job.R

Conclusion

So to run a thing on github actions we have to give instructions to the infrastructure, we tell it what docker container to use, what things to install and what commands to run, until, finally, we can run our script. And now it runs every day.

References

Reproducibility

< details> < summary> At the moment of creation (when I knitted this document ) this was the state of my machine: **click here to expand**
sessioninfo::session_info()

─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.0.2 (2020-06-22)
os macOS Catalina 10.15.6
system x86_64, darwin17.0
ui X11
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz Europe/Amsterdam
date 2020-09-24
─ Packages ───────────────────────────────────────────────────────────────────
package * version date lib source
assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.0)
cli 2.0.2 2020-02-28 [1] CRAN (R 4.0.0)
crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.0)
digest 0.6.25 2020-02-23 [1] CRAN (R 4.0.0)
evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.0)
fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.0)
glue 1.4.1 2020-05-13 [1] CRAN (R 4.0.1)
htmltools 0.5.0 2020-06-16 [1] CRAN (R 4.0.1)
knitr 1.29 2020-06-23 [1] CRAN (R 4.0.1)
magrittr 1.5 2014-11-22 [1] CRAN (R 4.0.0)
rlang 0.4.7 2020-07-09 [1] CRAN (R 4.0.2)
rmarkdown 2.3 2020-06-18 [1] CRAN (R 4.0.1)
sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.1)
stringi 1.4.6 2020-02-17 [1] CRAN (R 4.0.0)
stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.0)
withr 2.2.0 2020-04-20 [1] CRAN (R 4.0.2)
xfun 0.15 2020-06-21 [1] CRAN (R 4.0.2)
yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.0)
[1] /Library/Frameworks/R.framework/Versions/4.0/Resources/library

To leave a comment for the author, please follow the link and comment on their blog: Category R on Roel's R-tefacts.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.