Project package libraries and reproducibility

August 12, 2016
By

(This article was first published on Mango Solutions » R Blog, and kindly contributed to R-bloggers)

Gábor Csárdi, Consultant, Mango Solutions

Introduction

If you are an R user it has probably happened to you that you upgraded some R package in your R installation, and then suddenly your R script or application stopped working.

R packages change over time, and often they are not backwards compatible. This is normal and needed for innovation. But it is also annoying for end users. What can R users do to solve this problem?

Project package libraries

One strategy is that you create a new package library for a new project. A package library is just a directory that holds all installed R packages. (In addition to the ones that are installed with R itself.) There are various ways to configure R to use a separate package library, e.g. by setting the R_LIBS environment variable, or via the .libPaths()  function. (Read more about this on the .libPaths() manual page.

A separate package library for each project ensures that updating packages for one project does not affect the well-being of another project, so this is a good first step. It also allows using different versions of the same R packages in different projects.

Once the project is done and works well in this specific environment (or the deadline is over), you need to make sure that you do not update your R packages in the project library any more, not giving any chance for your application to break.

The pkgsnap package

It often happens, that you need to transfer your application to another computer, e.g. from the development environment to production, or you just want to share it with another developer. This means creating a project package library in the new environment, and then installing the same versions of the same packages from CRAN.

This is surprisingly challenging using the tools in base R. R’s package management tools are very robust, and great at installing the latest versions of the packages reliably. But installing certain package versions, and certain versions of their dependencies is not something they can do, at least not directly.

This is why we created the pkgsnap tool. This is a very simple package with two exported functions:

  • snap takes a snapshot of your project library. It writes out the names and versions of the currently installed packages into a text file. You can put this text file into the version control repository of the project, to make sure it is not lost.
  • restore uses the snapshot file to recreate the package project library from scratch. It installs the recorded versions of the recorded packages, in the right order.

Example

Here is a little demonstration of how pkgsnap works. First we create a project package library, and install pkgsnap there.

lib_dir <- tempfile()
dir.create(lib_dir)
devtools::install_github("mangothecat/pkgsnap", lib = lib_dir)

We instruct R to use this library directory. This is ideally done from a local .Rprofile file in your project directory. We also load pkgsnap.


.libPaths(lib_dir)
library(pkgsnap)

Currently the new package library is empty, we will install some R packages from CRAN into it. The packages they depend on will be also installed.


install.packages(c("testthat", "pkgconfig"))
installed.packages(lib_dir)[, c("Package", "Version")]
#>           Package     Version
#> crayon    "crayon"    "1.3.1"
#> digest    "digest"    "0.6.8"
#> memoise   "memoise"   "0.2.1"
#> pkgconfig "pkgconfig" "2.0.0"
#> pkgsnap   "pkgsnap"   "1.0.0"
#> praise    "praise"    "1.0.0"
#> testthat  "testthat"  "0.11.0"

To create a snapshot (to a temporary file this time), you can just call snap with the name of the snapshot file.


snapshot <- tempfile()
snap(to = snapshot)

The packages and their versions are now recorded in the snapshot file. To demonstrate restoration we scrap our package library, and create a new one:


unlink(lib_dir, recursive = TRUE)
new_lib_dir <- tempfile()
dir.create(new_lib_dir)
.libPaths(new_lib_dir)

We are ready to run restore:


restore(snapshot)
#> Downloading
#>   crayon_1.3.1.tgz...  done.
#>   digest_0.6.8.tgz...  done.
#>   memoise_0.2.1.tgz...  done.
#>   pkgconfig_2.0.0.tgz...  done.
#>   pkgsnap_1.0.0.tgz...   pkgsnap_1.0.0.tar.gz...   pkgsnap_1.0.0.tar.gz... ERROR.
#>   praise_1.0.0.tgz...  done.
#>   testthat_0.11.0.tgz...  done.
#> Installing
#>   digest_0.6.8.tgz ... done.
#>   memoise_0.2.1.tgz ... done.
#>   crayon_1.3.1.tgz ... done.
#>   pkgconfig_2.0.0.tgz ... done.
#>   praise_1.0.0.tgz ... done.
#>   testthat_0.11.0.tgz ... done.

Note that we cannot restore the pkgsnap package, because it is not on CRAN (yet). The rest of the packages were downloaded and installed:


installed.packages(new_lib_dir)[, c("Package", "Version")]
#>           Package     Version
#> crayon    "crayon"    "1.3.1"
#> digest    "digest"    "0.6.8"
#> memoise   "memoise"   "0.2.1"
#> pkgconfig "pkgconfig" "2.0.0"
#> praise    "praise"    "1.0.0"
#> testthat  "testthat"  "0.11.0"

Other approaches

packrat is a package dependency management system by RStudio. It is (a lot) more sophisticated and has more features than the simple approach shown here. See more at https://rstudio.github.io/packrat

MRAN is a project by Revolution Analytics, now Microsoft.   It allows you to go back in time and install CRAN packages as they were current on a given date.  See more at https://mran.revolutionanalytics.com/

Summary

pkgsnap is an R package that helps you recreate package libraries. pkgsnap has no compiled code, and no dependencies, so you can install it easily. It is curently available from GitHub: https://github.com/mangothecat/pkgsnap. You can install it via the devtools package:

devtools::install_github("mangothecat/pkgsnap")

Give it a try if you like the idea and tell us what you think.

To leave a comment for the author, please follow the link and comment on their blog: Mango Solutions » R Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)