R, Docker and Checkpoint: A Route to Reproducibility

[This article was first published on R on datawookie, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I need to deploy Shiny on a Windows machine. I also need to use {checkpoint} for package management. Using Docker seems to be the only reasonable approach to Shiny on Windows. But how easy would it be to also factor {checkpoint} into this setup?

Only one reasonable way to find out: give it a try.

Below is the simple Dockerfile I used. Here are the fundamental components of what it does:

  • Derived from the R-3.6.1 image from rocker.
  • Create an environment variable CHECKPOINT_DATE with the snapshot date for {checkpoint}.
  • Install the {checkpoint} package.
  • Make a snapshot folder for {checkpoint}.
  • Add commands to .Rprofile which will load {checkpoint} and select the required snapshot.
  • Install a sample package under {checkpoint}.
FROM rocker/r-ver:3.6.1

ENV CHECKPOINT_DATE 2018-12-01

RUN R -e "install.packages('checkpoint')" && \
    mkdir -p /root/.checkpoint/${CHECKPOINT_DATE} && \
    echo "library(checkpoint); checkpoint('${CHECKPOINT_DATE}', scanForPackages = FALSE);" >~/.Rprofile && \
    R -e "install.packages('colorspace')"

After building the Docker image you’re ready to give this a whirl. There are (at least) two ways that you could use this:

  • install packages onto the image (will result in image bloat and requirement to rebuild for any new packages) or
  • share a volume from the host which contains a {checkpoint} snapshot folder with the Docker container.

Packages on the Image

Let’s look at the option of installing packages on the image first. The Dockerfile above already installs the {colorspace} package. Let’s test that out.

In the screen shot below the top panel shows the launch of the image and successful loading of the {colorspace} package. The lower panel connects a shell to the running container and lists the contents of the snapshot folder to confirm that the {colorspace} package is there.

Packages on the Host

If you share a snapshot folder from the host with the container then you get a lot more flexibility.

In the screen shot below the top panel shows the launch of the image, where the ~/.checkpoint folder on the host is shared with the container. Now it’s possible to select any of the snapshots present on the host. For example, rather than choosing the 2018-12-01 snapshot installed on the image, we can now select the 2019-06-01 snapshot from the host.

Of these two options the latter seems like a more flexible solution. If, however, your aim is to provide a Docker image with a complete (and reproducible) computational environment, then the former is definitely the way to go: it’s less flexible but the package versions are all locked down on the image.

To leave a comment for the author, please follow the link and comment on their blog: R on datawookie.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)