R, Docker and Checkpoint: A Route to Reproducibility
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I need to deploy Shiny on a Windows machine. I also need to use {checkpoint} for package management. Using Docker seems to be the only reasonable approach to Shiny on Windows. But how easy would it be to also factor {checkpoint} into this setup?
Only one reasonable way to find out: give it a try.
Below is the simple Dockerfile I used. Here are the fundamental components of what it does:
- Derived from the R-3.6.1 image from rocker.
- Create an environment variable CHECKPOINT_DATEwith the snapshot date for{checkpoint}.
- Install the {checkpoint}package.
- Make a snapshot folder for {checkpoint}.
- Add commands to .Rprofilewhich will load{checkpoint}and select the required snapshot.
- Install a sample package under {checkpoint}.
FROM rocker/r-ver:3.6.1
ENV CHECKPOINT_DATE 2018-12-01
RUN R -e "install.packages('checkpoint')" && \
    mkdir -p /root/.checkpoint/${CHECKPOINT_DATE} && \
    echo "library(checkpoint); checkpoint('${CHECKPOINT_DATE}', scanForPackages = FALSE);" >~/.Rprofile && \
    R -e "install.packages('colorspace')"
After building the Docker image you’re ready to give this a whirl. There are (at least) two ways that you could use this:
- install packages onto the image (will result in image bloat and requirement to rebuild for any new packages) or
- share a volume from the host which contains a {checkpoint}snapshot folder with the Docker container.
Packages on the Image
Let’s look at the option of installing packages on the image first. The Dockerfile above already installs the {colorspace} package. Let’s test that out.
In the screen shot below the top panel shows the launch of the image and successful loading of the {colorspace} package. The lower panel connects a shell to the running container and lists the contents of the snapshot folder to confirm that the {colorspace} package is there.

Packages on the Host
If you share a snapshot folder from the host with the container then you get a lot more flexibility.
In the screen shot below the top panel shows the launch of the image, where the ~/.checkpoint folder on the host is shared with the container. Now it’s possible to select any of the snapshots present on the host. For example, rather than choosing the 2018-12-01 snapshot installed on the image, we can now select the 2019-06-01 snapshot from the host.

Of these two options the latter seems like a more flexible solution. If, however, your aim is to provide a Docker image with a complete (and reproducible) computational environment, then the former is definitely the way to go: it’s less flexible but the package versions are all locked down on the image.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
