Running an R Script on a Schedule: Docker Containers on gitlab

[This article was first published on Category R on Roel's R-tefacts, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In this tutorial/howto I show you how to run a docker container on a schedule on gitlab.

Docker containers are awesome because, once made, they run everywhere! It does not matter what type of computer^[Though I believe there is a problem with ARM based vs other CPU’s]. you have. Once I build a container you can run my container on a linux box, windows machine or mac. This is also why people love containers for production, you can finally truly pick up a container from development and hand it over to production.

Thanks to the massive work by the rocker team we have containers ready that ‘just work’ with R, there are even containers with Rstudio installed!

My docker file

I’m using the versioned container rocker/r-ver set to the latest R version as of now 4.0.2. I could set it to latest but that would mean the container could just break when those containers are refreshed. Keeping this fixed version saves us a lot of headache later on.

For building of packages though, it is super useful to always test against the latest versions!

For how to set up a docker file I refer you to the excellent tutorials ‘An introduction to Docker for R users’ by Colin Fay, and the larger tutorial by ROpensci.

My docker file has 3 steps

    1. take the basic container and update it
FROM rocker/r-ver:4.0.2
# before step (in gitlab)
# - update, and set for maximum cpu use
# - make a renv folder and install renv
RUN apt-get update
RUN echo "options(Ncpus = $(nproc --all))" >> /usr/local/lib/R/etc/Rprofile.site
    1. setup renv
RUN mkdir -p ~/.local/share/renv
RUN R -e 'install.packages("renv")'
    1. user settings: install the systems libraries, copy files to the container and install all the necessary packages from the lockfile
# user settings
# - install the systems libraries
# - copy script and lock file
RUN apt-get install -y --no-install-recommends libcurl4-openssl-dev libssl-dev libxt6
COPY run_job.R run_job.R
COPY renv.lock renv.lock
# I found that renv::restore does not use the super fast
# rstudio package manager, and so by pre instaling rtweet and ggplot2
# and all their dependencies we get way faster building speed
RUN R -e 'install.packages(c("ggplot2","rtweet"))'
RUN R -e 'renv::restore()'
# on running of the container this is called:
CMD Rscript run_job.R

Build the container with docker build -t <name_for_container> .

Test it out using docker run --env-file .Renviron <name_for_container>

Using gitlab

gitlab has its own docker registry where you can push to. see the gitlab docs for more info.

You have to set up some authentification with docker first, but after that you can do a docker build and push it to a registry on gitlab. This is super useful for many things. We could build a container with everything we want already installed and use that to test our new code!

But we can take it even one step further. We can make gitlab create the container, save it and run it!

The .gitlab-ci.yml file is quite clean and was super easy to modify from the examples given here. Gitlab has amazing documentation!

The configuration has 2 stages: build and test.

  • Build makes the container and pushes it to the local registry
  • Test pulls the container from the registry and runs the container

It is waaay faster than my previous approach. A nice effect of having two stages in this file is that if the later stage fails you can rerun that stage without rerunning the first part.

Image that shows step 1 had success, but 2 not

You can schedule it again like in this gitlab post go to CI/CD – schedules and make a schedule

It is also possible to push docker containers to github, GCP or other cloud providers. Some of which I will explore in the future.

That is all for today!

References

Reproducibility

At the moment of creation (when I knitted this document ) this was the state of my machine: **click here to expand**
sessioninfo::session_info()

─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.0.2 (2020-06-22)
os macOS Catalina 10.15.6
system x86_64, darwin17.0
ui X11
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz Europe/Amsterdam
date 2020-09-25
─ Packages ───────────────────────────────────────────────────────────────────
package * version date lib source
assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.0)
cli 2.0.2 2020-02-28 [1] CRAN (R 4.0.0)
crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.0)
digest 0.6.25 2020-02-23 [1] CRAN (R 4.0.0)
evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.0)
fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.0)
glue 1.4.1 2020-05-13 [1] CRAN (R 4.0.1)
htmltools 0.5.0 2020-06-16 [1] CRAN (R 4.0.1)
knitr 1.29 2020-06-23 [1] CRAN (R 4.0.1)
magrittr 1.5 2014-11-22 [1] CRAN (R 4.0.0)
rlang 0.4.7 2020-07-09 [1] CRAN (R 4.0.2)
rmarkdown 2.3 2020-06-18 [1] CRAN (R 4.0.1)
sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.1)
stringi 1.4.6 2020-02-17 [1] CRAN (R 4.0.0)
stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.0)
withr 2.2.0 2020-04-20 [1] CRAN (R 4.0.2)
xfun 0.15 2020-06-21 [1] CRAN (R 4.0.2)
yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.0)
[1] /Library/Frameworks/R.framework/Versions/4.0/Resources/library

To leave a comment for the author, please follow the link and comment on their blog: Category R on Roel's R-tefacts.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)