by Andrie de Vries
For much of my data science work, I want to have the very latest package from CRAN or github. However, once any work finds it way into production server (where it runs on a regular schedule), I want my environment to be stable. Most importantly, for these projects I want to ensure I have reproducible results. In these cases I want to isolate the packages I use, and ensure I don't "pollute" my library with the most recent package versions. In this post I give some tips for keeping my libraries clean.
R uses a single package library for each installed version of R on your machine. Fortunately it is easy to modify the path where R installs your packages. To do this, you simply call the function .libPaths() and specify the library location.
Changing your library location
To change the library location, you use the function .libPaths().
In R, a library is the location on disk where you install your packages. R creates a different library for each dot-version of R itself. For example, R-3.0.x and R-3.1.x have different library locations. However, R-3.2.0 and R-3.2.1 share the same location.
For example, to use ~/R/win-library/3.1-mran-2015-06-20 as your library location, try:
The initialization sequence of R
When R starts, it performs a series of steps to initialize the session. You can modify the startup sequence by changing the contents in a number of locations.
The following sequence is somewhat simplified:
- First, R reads the file Rprofile.site in the R_Home/etc folder, where R_HOME is the location where you installed R.
- For example, this file could live at C:RR-3.2.2etcRprofile.site.
- Making changes to this file affects all R sessions that use this version of R.
- This might be a good place to define your preferred CRAN mirror, for example.
- Next, R reads the file ~/.Rprofile in the user's home folder.
- Lastly, R reads the file .Rprofile in the project folder
- This might be a good place to define some project-specific settings
Contents of the .Rprofile in the project root
One of the projects I am working on is to extract data from AzureML Studio and manipulate this data in R. Right now, AzureML still uses R-3.1.0 – this means I have to test locally using R-3.1.x.
However, I also want to use a set of packages that is reasonably close to the actual set of R packages installed on AzureML. To do this, I am making use of the CRAN time machine that we built at MRAN. Specifically, I want to use a snapshot of 2015-06-20.
So the contents of my project-specific .Rprofile is this:
Revolution R Enterprise version 7.4 (64-bit): an enhanced distribution of R
Revolution Analytics packages Copyright (C) 2015 Revolution Analytics, Inc.
Type 'revo()' to visit www.revolutionanalytics.com for the latest
Revolution R news, 'forum()' for the community forum, or 'readme()'
for release notes.
What else is in my project folder?
A warning on code reproducibility and sharing
How to ignore the .Rprofile settings
- R –no-site-file ignores the Rprofile.site
- R –no-init-file ignores the .Rprofile in the user home folder
- R –vanilla ignores all the these (and more in addition)
To find out more, study the detailed help available in ?"Startup".
Following these simple practices will help you keep things straight while you are coding and ease the transition of putting your code into production.