Best practices for handling packages in R projects

[This article was first published on Revolutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

by Andrie de Vries

For much of my data science work, I want to have the very latest package from CRAN or github.  However, once any work finds it way into production server (where it runs on a regular schedule), I want my environment to be stable. Most importantly, for these projects I want to ensure I have reproducible results. In these cases I want to isolate the packages I use, and ensure I don't “pollute” my library with the most recent package versions. In this post I give some tips for keeping my libraries clean.

R uses a single package library for each installed version of R on your machine.  Fortunately it is easy to modify the path where R installs your packages. To do this, you simply call the function .libPaths() and specify the library location.

Changing your library location

To change the library location, you use the function .libPaths()

In R, a library is the location on disk where you install your packages. R creates a different library for each dot-version of R itself. For example, R-3.0.x and R-3.1.x have different library locations.  However, R-3.2.0 and R-3.2.1 share the same location.

For example, to use ~/R/win-library/3.1-mran-2015-06-20 as your library location, try:

> .libPaths("~/R/win-library/3.1-mran-2015-06-20")
> .libPaths()
[1] "C:/Users/adevries/Documents/R/win-library/3.1-mran-2015-06-20"
[2] "C:/R/R-3.1.3/library" 

The initialization sequence of R

When R starts, it performs a series of steps to initialize the session. You can modify the startup sequence by changing the contents in a number of locations.

The following sequence is somewhat simplified:

  • First, R reads the file Rprofile.site in the R_Home/etc folder, where R_HOME is the location where you installed R.
    • For example, this file could live at C:RR-3.2.2etcRprofile.site
    • Making changes to this file affects all R sessions that use this version of R.
    • This might be a good place to define your preferred CRAN mirror, for example.
  • Next, R reads the file ~/.Rprofile in the user's home folder.
  • Lastly, R reads the file .Rprofile in the project folder
    • This might be a good place to define some project-specific settings
Thus, to define a custom library path for a specific project, you can create a .Rprofile file in the root of your project, and then make the changes there.

Contents of the .Rprofile in the project root

One of the projects I am working on is to extract data from AzureML Studio and manipulate this data in R. Right now, AzureML still uses R-3.1.0 – this means I have to test locally using R-3.1.x.

However, I also want to use a set of packages that is reasonably close to the actual set of R packages installed on AzureML. To do this, I am making use of the CRAN time machine that we built at MRAN. Specifically, I want to use a snapshot of 2015-06-20.

So the contents of my project-specific .Rprofile is this:

options(repos = c(CRAN = "http://mran.revolutionanalytics.com/snapshot/2015-06-20"))
.libPaths("~/R/win-library/3.1-mran-2015-06-20")
message("Using library: ", .libPaths()[1])
This code has the effect of changing my CRAN mirror and changing the library location to ~/R/win-library/3.1-mran-2015-06-20. Finally it prints a useful message to the console on startup:
 
Restarting R session…

Revolution R Enterprise version 7.4 (64-bit): an enhanced distribution of R
Revolution Analytics packages Copyright (C) 2015 Revolution Analytics, Inc.

Type 'revo()' to visit www.revolutionanalytics.com for the latest
Revolution R news, 'forum()' for the community forum, or 'readme()'
for release notes.

Using library: C:/Users/adevries/Documents/R/win-library/3.1-mran-2015-06-20
>
 

What else is in my project folder?

The project folder I'm working with is a fairly standard package folder under version control. The package I'm working on is called “azureml“. So in the screenshot below you see my RStudio project file, the git artifacts as well as the tarball for the built package (i.e. the result of R CMD build). You may notice that I keep the package itself in a daughter folder, also called azureml. This allows me to keep the R package folder itself clean, whilst storing any project related stuff in the root folder.
 
Capture
 

A warning on code reproducibility and sharing

Be careful when using the global .Rprofile mechanisms (~/.Rprofile, and Rprofile.site) to make changes in your settings that modify how R code runs.
 
I think it is perfectly acceptable to configure infrastructure settings in .Rprofile, for example setting your CRAN mirror.
 
However, if you define settings in your .Rprofile that changes how R behaves, then it gets harder to share your code and still expect the code to be reproducible.  For example, one common view is that the stringsAsFactors = TRUE setting is the wrong choice. So, although it is possible to change this by using options(stringsAsFactors = FALSE), it is probably wise to do this at the top of your script, rather than in the .Rprofile.
 

How to ignore the .Rprofile settings

Sometime you may wish to run an R session that ignores some or all of the .Rprofile instructions. To do this, you can specify a number of command line arguments when starting R.
 
For example:
  • R –no-site-file ignores the Rprofile.site
  • R –no-init-file ignores the .Rprofile in the user home folder
  • R –vanilla ignores all the these (and more in addition)

To find out more, study the detailed help available in ?”Startup”.

Following these simple practices will help you keep things straight while you are coding and ease the transition of putting your code into production.

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)