Site icon R-bloggers

State of R packages in your library

[This article was first published on Posts on R-hub blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Ever wondered where packages in general and their code in particular go when you run something like install.packages()? This post is for you!

Where do installed packages live?

Packages are installed

Now at library loading, the important argument is called lib.loc, not lib.

Random #RStats "This drives me mad":

install.packages( lib = )

library( lib.loc = )

— Colin Fay ???? (@_ColinFay) June 23, 2020

Furthermore, tweaking this argument is best avoided: e.g. if you only use a package via ::, but do not import it via the namespace, then when :: runs R will not search in mylib.

Now how do you know where any of your installed packages was installed? You can use find.package() and path.package()!

To check whether a package is installed, it is better to use find.package() than installed.packages() because the latter, as its docs state, can be slow on some systems. In both cases, it does not mean the package is usable, for that you’d need to use library() or require().

What files are stored locally?

The R packages book by Hadley Wickham and Jenny Bryan has a very neat chapter called “Package structure and state”, including an explanation of the binary state. It says “There are no .R files in the R/ directory – instead there are three files that store the parsed functions in an efficient file format. This is basically the result of loading all the R code and then saving the functions with save(). (In the process, this adds a little extra metadata to make things as fast as possible)."

The installed packages in the library do not contain the original R files, see ggplot2 source code and ggplot2 on my disk

fs::dir_tree(
  file.path(
    find.package("ggplot2"),
    "R"
    )
  )

├── ggplot2
├── ggplot2.rdb
└── ggplot2.rdx

Under the R folder, there are three files that don’t even have the dot R extension!

How is code stored?

Now, regarding the code, let’s mention two important things happening to it.

Byte compilation

Since R 3.5, the code is byte-compiled by default which means it is also stored in a format easier for a machine to deal with. You can learn more about byte compilation in the Efficient R Programming book by Colin Gillespie and Robin Lovelace, and in a talk by R Core Member Tomas Kalibera.

Original formatting and comments?

Also, by default, note that the source code is stripped of all empty lines and comments because they are useless for code execution and take up space.2

It is similar to CSS, JS, HTML being minified in web development to make websites load faster. Now sometimes you might want to keep code with its comments: as an user for being able to read it locally with all its comments, as a developer for debugging or profiling (to have line numbers in parsed code refer to actual line numbers you can look up in your scripts).

As an user installing packages, you need to look into the keep.source.pkgs option in options() that influences the behavior of package installation, or for a specific package you’d write install.packages("rhub", INSTALL_opts = "--with-keep.source", type = "source").3 If you use Windows or Mac and don’t write type = "source", binaries might be use in which case the keep.source.pkgs option is ignored.

As a developer working interactively on a package (with e.g. devtools::load_all()), you need to make sure the source is kept as is when loading the package, and when loading it (lucky you, the relevant keep.source option is TRUE by default in interactive sessions ????).

As a developer you might also encounter the case where R CMD check will tell you about another switch, in an environment variable. It is a switch related to package installation, since R CMD check will install your package for checking it . See the lines below from the R source mirror:

                        wrapLog("Information on the location(s)",
                                "of code generating the",
                                paste0(sQuote("Note"), "s"),
                                "can be obtained by re-running with",
                                "environment variable R_KEEP_PKG_SOURCE",
                                "set to 'yes'.\n")

Also note that there is also a way for package maintainers to force the installation of their package to keep the source. Here are packages that do that. A potential use case might be to try and hire people like the web development team at The Guardian seems to do if you view the source of its website.

As a summary: for keeping the source when loading code, in particular for a package with devtools::load_all(), there is the keep.source option. For keeping the source of a package at installation you need to use the keep.source.pkgs option (R CMD install --with-keep.source) or the R_KEEP_PKG_SOURCE environment variable or to be installing a package that forces the source keeping.

Conclusion

In this post we summarized where packages live once installed, in what format, and how their code is processed at installation. An important aspect was the original code formatting and commenting being removed by default, unless one changes some options for installing packages. Do you use any of options related to keeping source in your R usage and development? How do you read source code?

< section class="footnotes" role="doc-endnotes">
  1. If your wish is to isolate packages you are installing for a given project, you might find a better workflow by using Docker or the renv package. ↩︎

  2. This came to my attention thanks to a question by Ofek Shilon on RStudio community. ↩︎

  3. When viewing source code you might get a better default experience by loading lookup in your .Rprofile if you don’t use Windows, because if you use Windows and do this you won’t be able to update the loaded packages with compiled code. ↩︎

To leave a comment for the author, please follow the link and comment on their blog: Posts on R-hub blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.