Ever wondered where packages in general and their code in particular go when you run something like
install.packages()? This post is for you!
Where do installed packages live?
Packages are installed
most often since you won’t give any, at the first path returned by
.libPaths()that exists and for which the user has the right permissions. There are several ways to change the paths returned, should you want to do so.1
Now at library loading, the important argument is called
Random #RStats “This drives me mad”:
install.packages( lib = )
library( lib.loc = )
— Colin Fay 🤘 (@_ColinFay) June 23, 2020
To check whether a package is installed, it is better to use
installed.packages() because the latter, as its docs state, can be slow on some systems. In both cases, it does not mean the package is usable, for that you’d need to use
What files are stored locally?
The R packages book by Hadley Wickham and Jenny Bryan has a very neat chapter called “Package structure and state”, including an explanation of the binary state. It says “There are no .R files in the R/ directory – instead there are three files that store the parsed functions in an efficient file format. This is basically the result of loading all the R code and then saving the functions with save(). (In the process, this adds a little extra metadata to make things as fast as possible).”
The installed packages in the library do not contain the original R files, see ggplot2 source code and ggplot2 on my disk
fs::dir_tree( file.path( find.package("ggplot2"), "R" ) ) ├── ggplot2 ├── ggplot2.rdb └── ggplot2.rdx
Under the R folder, there are three files that don’t even have the dot R extension!
How is code stored?
Now, regarding the code, let’s mention two important things happening to it.
Since R 3.5, the code is byte-compiled by default which means it is also stored in a format easier for a machine to deal with. You can learn more about byte compilation in the Efficient R Programming book by Colin Gillespie and Robin Lovelace, and in a talk by R Core Member Tomas Kalibera.
Original formatting and comments?
Also, by default, note that the source code is stripped of all empty lines and comments because they are useless for code execution and take up space.2
It is similar to CSS, JS, HTML being minified in web development to make websites load faster. Now sometimes you might want to keep code with its comments: as an user for being able to read it locally with all its comments, as a developer for debugging or profiling (to have line numbers in parsed code refer to actual line numbers you can look up in your scripts).
As an user installing packages, you need to look into the
keep.source.pkgs option in
options() that influences the behavior of package installation, or for a specific package you’d write
install.packages("rhub", INSTALL_opts = "--with-keep.source", type = "source").3 If you use Windows or Mac and don’t write
type = "source", binaries might be use in which case the
keep.source.pkgs option is ignored.
As a developer working interactively on a package (with e.g.
devtools::load_all()), you need to make sure the source is kept as is when loading the package, and when loading it (lucky you, the relevant
keep.source option is
TRUE by default in interactive sessions 🎉).
As a developer you might also encounter the case where
R CMD check will tell you about another switch, in an environment variable. It is a switch related to package installation, since
R CMD check will install your package for checking it . See the lines below from the R source mirror:
wrapLog("Information on the location(s)", "of code generating the", paste0(sQuote("Note"), "s"), "can be obtained by re-running with", "environment variable R_KEEP_PKG_SOURCE", "set to 'yes'.\n")
Also note that there is also a way for package maintainers to force the installation of their package to keep the source. Here are packages that do that. A potential use case might be to try and hire people like the web development team at The Guardian seems to do if you view the source of its website.
As a summary: for keeping the source when loading code, in particular for a package with
devtools::load_all(), there is the
keep.source option. For keeping the source of a package at installation you need to use the
keep.source.pkgs option (
R CMD install --with-keep.source) or the
R_KEEP_PKG_SOURCE environment variable or to be installing a package that forces the source keeping.
In this post we summarized where packages live once installed, in what format, and how their code is processed at installation. An important aspect was the original code formatting and commenting being removed by default, unless one changes some options for installing packages. Do you use any of options related to keeping source in your R usage and development? How do you read source code?
When viewing source code you might get a better default experience by loading
lookupin your .Rprofile if you don’t use Windows, because if you use Windows and do this you won’t be able to update the loaded packages with compiled code. ↩︎