Site icon R-bloggers

{depcheck}: R Package Dependency Checker

[This article was first published on Ashley's Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A month or so ago, over 700 R package maintainers were e-mailed about the potential removal of the {lubridate} package due to a test failing on MacOS systems.

This is a chunk of the 1000+ lines email that about 900+ #rstats package maintainers received due to the impending archival of lubridate.#lubridatecalypse pic.twitter.com/Ccr4eabaLr

Elio Campitelli (@d_olivaw) October 5, 2021

The error has since been fixed, and there has been no ‘lubrigate’ or ‘lubridatecalypse’. However, it does beg the question, do all 900+ packages that are importing {lubridate} actually require the package enough for it to be a dependency? The short answer is no.

I was one of the maintainers included in the recipient list, as my package {appler} used {lubridate} to manipulate some timestamps from the Apple App Store API. Around the same time, my package also failed a test, so whilst updating the tests, I decided to check where {lubridate} was being used. Turns out it was just once, where it converted a numeric time-stamp of a review to POSIXct using lubridate::as_datetime. Looking at the source code behind as_datetime, it is a wrapper for the base function as.POSIXct, and nothing more. One commit and release later, and {lubridate} is no longer a dependency for my package.

Although this is just one package and one dependency, it is certainly not the only situation. So the next step is: How to make it easier to search for these dependencies which can be removed? Introducing {depcheck}, a package that will check the dependencies of a package, and will flag any package that could be looked into, either to copy over the used functions, or remove entirely from the package. Several advantages of dependency reduction include:

Using {depcheck}

Currently there are 3 ways to check package dependency usage:

  1. checkPacakgeDependencyUse() will read the package DESCRIPTION file, extract the packages from Depends and Imports fields, and search for their use in the R directory.
  2. checkShinyDependencyUse() will search for any library, require or :: call within the core shiny R scripts (and any specified directories), and search for their use in the same files.
  3. checkProjectDependencyUse() is a generic version of checkShinyDependencyUse(), which can be applied to any project.

The result of all of these is a list, where the names are the dependent packages. Each item in the list contains a data.frame of all the exported functions in the package and the frequency of use. When printed, it will display the number of dependencies in the project, as well as the number of sub-dependencies, and if any of them should be looked into for potential removal. This should also make it a lot easier to find dependencies that you aren’t entirely where and how they are used.

NB As of writing this, {depcheck} is in an experimental phase; function names and/or arguments may change from those stated above.

Example

I have run checkShinyDependencyUse() on one of my own shiny applications, the (Reddit Profile Analyzer)[https://ashbaldry.shinyapps.io/reddit_analysis/], to see how well I am utilising the packages I have used.

project_dependencies <- checkShinyDependencyUse("../reddit-analysis-app") # ashbaldry/reddit-analysis-app
summary(project_dependencies)
# Number of Declared Packages: 14
# Total Number of Dependencies: 85
# Declared Packages: utils, glue, httr, highcharter, scales, shiny.semantic, htmlwidgets, stringi, 
# quanteda, R6, data.table, shiny, promises, magrittr
# Function usage for 'glue', 'htmlwidgets', 'stringi', 'magrittr' are below the specified thresholds. 
# Print individual package summaries to check if packages can be removed

Clearly, there are potential improvements that can be made, 4 packages have been flagged for low use. Looking further into a couple of these packages:

project_dependencies$stringi
# Package: 'stringi'
# Package Dependencies: 0
# Package Usage: 1 / 256 (0%)
# Functions Used: stri_split_regex
# Function usage for 'stringi' is below the specified thresholds. Consider copying used function to reduce dependencies

{stringi} has 0 dependencies that aren’t base R packages, and the function stri_split_regex uses C++ code, so it doesn’t seem like a natural contender to copy over to the application. Whilst strsplit could normally be a potential alternative, the returning list from this particular regular expression doesn’t match, so it is not a viable option. When running the checks in the future, we can include summary(project_dependencies, ignore_low_usage_packages = "stringi") to avoid seeing the warning message for {stringi}.

project_dependencies$htmlwidgets
# Package: 'htmlwidgets'
# Package Dependencies: 7
# Package Usage: 1 / 14 (7%)
# Functions Used: JS
# Function usage for 'htmlwidgets' is below the specified thresholds. Consider copying used function to reduce dependencies

{htmlwidgets} would be a great candidate for removal. It has a reasonable number of dependent packages, and the JS function is collapsing a character vector and assigning an extra class. This can easily be added into a utility function script and have one fewer dependency.

The other two packages, {glue} and {magrittr}, are both lightweight packages and are used throughout the codebase. There are arguments to use paste and avoid piping. However they are well maintained packages, and they help make the code more readable without adding too many extra dependencies to the project.

To leave a comment for the author, please follow the link and comment on their blog: Ashley's Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.