R, Docker and Checkpoint: A Route to Reproducibility

[This article was first published on R on datawookie, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I need to deploy Shiny on a Windows machine. I also need to use {checkpoint} for package management. Using Docker seems to be the only reasonable approach to Shiny on Windows. But how easy would it be to also factor {checkpoint} into this setup?

Only one reasonable way to find out: give it a try.

Below is the simple Dockerfile I used. Here are the fundamental components of what it does:

  • Derived from the R-3.6.1 image from rocker.
  • Create an environment variable CHECKPOINT_DATE with the snapshot date for {checkpoint}.
  • Install the {checkpoint} package.
  • Make a snapshot folder for {checkpoint}.
  • Add commands to .Rprofile which will load {checkpoint} and select the required snapshot.
  • Install a sample package under {checkpoint}.
FROM rocker/r-ver:3.6.1

ENV CHECKPOINT_DATE 2018-12-01

RUN R -e "install.packages('checkpoint')" && \
    mkdir -p /root/.checkpoint/${CHECKPOINT_DATE} && \
    echo "library(checkpoint); checkpoint('${CHECKPOINT_DATE}', scanForPackages = FALSE);" >~/.Rprofile && \
    R -e "install.packages('colorspace')"

After building the Docker image you’re ready to give this a whirl. There are (at least) two ways that you could use this:

  • install packages onto the image (will result in image bloat and requirement to rebuild for any new packages) or
  • share a volume from the host which contains a {checkpoint} snapshot folder with the Docker container.

Packages on the Image

Let’s look at the option of installing packages on the image first. The Dockerfile above already installs the {colorspace} package. Let’s test that out.

In the screen shot below the top panel shows the launch of the image and successful loading of the {colorspace} package. The lower panel connects a shell to the running container and lists the contents of the snapshot folder to confirm that the {colorspace} package is there.

Packages on the Host

If you share a snapshot folder from the host with the container then you get a lot more flexibility.

In the screen shot below the top panel shows the launch of the image, where the ~/.checkpoint folder on the host is shared with the container. Now it’s possible to select any of the snapshots present on the host. For example, rather than choosing the 2018-12-01 snapshot installed on the image, we can now select the 2019-06-01 snapshot from the host. It’s important to note that installing packages on the container will now result in them being stored in the snapshot folder on the host.

Of these two options the latter seems like a more flexible solution. If, however, your aim is to provide a Docker image with a complete (and reproducible) computational environment, then the former is definitely the way to go: it’s less flexible but the package versions are all locked down on the image.

To leave a comment for the author, please follow the link and comment on their blog: R on datawookie.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)