The dependencies conundrum

June 5, 2018
By

(This article was first published on Colin Fay, and kindly contributed to R-bloggers)

A little dive into CRAN package dependencies.

What is a dependencies?

Roughly, a dependencie is a package needed by another package to work:
in other words, if someone has already coded a function that you need,
you can use it by listing another package as a dependency. But that also
implies that whenever someone install your package, the dependencies are
installed too.

Where dependencies can be problematic

Heavy dependencies

Some R packages are heavy, and can be complex to install on specific
machine. For example, {readr} does not install easily on small-RAM
ubuntu machine: see this
issue
] for more
information about that. That leads to not being able to install this
package, but also other packages dependending on {readr}: a behavior I
noticed lately with the {rtweet} package when I tried to install it on
a Digital Ocean small ubuntu droplet: see this
issue
.

The clipr effect

Whenever one of your dependencies is removed from the CRAN, your package
will be removed also. That happens to a series of package recently,
notably the {reprex} πŸ“¦, but that’s food for another blogpost see this
issue
if you want to read
more in the meantime.

In a few words, just imagine that if A depends on B, B on C and C on D,
if ever the D package is removed from the CRAN, you’ll need to find a
solution to keep your package up to date.

Dependencies when developping package

How many dependencies?

It’s a tough question and everybody has its take on it. Mine is: try to
avoid them if:

  • The dependency has a lot of other dependencies / is heavy to install
  • You just need one easy to rewrite function from that package

To answer the first question, you can use :

library(miniCRAN)
pkgDep("attempt", suggests = FALSE)
## [1] "attempt" "rlang"
pkgDep("proustr", suggests = FALSE)
##  [1] "proustr"    "dplyr"      "magrittr"   "stringr"    "rlang"     
##  [6] "purrr"      "tidyr"      "tokenizers" "SnowballC"  "assertthat"
## [11] "bindrcpp"   "glue"       "pkgconfig"  "R6"         "Rcpp"      
## [16] "tibble"     "tidyselect" "BH"         "plogr"      "stringi"   
## [21] "bindr"      "cli"        "crayon"     "pillar"     "utf8"

Which means that at least all these packages will be installed whenever
I want to install one of these.

Diving into the CRAN

Let’s answer the second question by having a look at how many package
from the CRAN have just one function imported from another package.

There would be several way to do that: but I chosed to parse the
NAMESPACE files, as it is the place where Depends and Imports are
listed with the same tag. Maybe not the most easy to deploy solution,
but I’ll be happy to hear about other solution!

library(rvest)
library(purrr)
library(glue)

# Get the NAMESPACE file

extract_namespace <- function(package, exdir){
  cat(crayon::green(glue("Extracting {package} NAMESPACE to {exdir} at {Sys.time()}.")), sep = "\n")
  a <- read_html(glue("https://cran.r-project.org/web/packages/{package}/index.html")) %>%
    html_nodes("a") %>%
    html_attr("href") %>%
    keep(~ grepl("tar.gz", .x)) %>%
    gsub("\\.\\./\\.\\./\\.\\.", "https://cran.r-project.org", .)
  
  tmp <- tempfile(fileext = ".tar.gz")
  download.file(a, tmp)
  nspc <- grep("NAMESPACE", untar(tmp, list = TRUE), value = TRUE)
  untar(tmp, file = nspc, exdir = exdir)
  unlink(tmp)
}

# Parsing of the import fields
parse_import <- function(pkg, vec){
  cleaned <- gsub('\\(|,|\\)|"', " ", vec) 
  splitted_bis <- strsplit(cleaned, " ") %>% as_vector() %>% discard(~ nchar(.x) == 0)
  type <- splitted_bis[1]
  dep <- splitted_bis[2]
  fun <- splitted_bis[3:length(splitted_bis)]
  if ( grepl("From", cleaned) ) {
    data.frame(origin = pkg, 
               type = type, 
               dep = dep, 
               fun = fun, 
               stringsAsFactors = FALSE)
  } else {
    data.frame(origin = pkg,
               type = type, 
               dep = dep, 
               fun = NA, 
               stringsAsFactors = FALSE)
  }
}

# Parsing the full file
parse_namespace <- function(pkg, file){
  readLines(normalizePath(file)) %>%
    keep(~ grepl("import", .x)) %>%
    map2_df(pkg, ~ parse_import(.y, .x))
}

# Wrap all this into one
extract_pkg_and_df_namespace <- function(list, exdir = "nsp"){
  walk(list, extract_namespace, exdir = exdir)
  df <- data.frame(pkg = list.files(exdir), 
                   loc = list.files(exdir, recursive = TRUE, full.names = TRUE), 
                   stringsAsFactors = FALSE)
  map2_df(df$pkg, df$loc, parse_namespace)
}

# Get the package db
pkg_db <- tools::CRAN_package_db()

# FIRE πŸ›« 
res <- extract_pkg_and_df_namespace(pkg_db$Package)

So, don’t try this at home, but know it took around 4 hours for me to
retrieve all these data. ### Let’s explore

Here,
we

library(tidyverse)
## ── Attaching packages ──────────────────────────────────────────────────────── tidyverse 1.2.1 ──

## βœ” ggplot2 2.2.1.9000     βœ” purrr   0.2.4     
## βœ” tibble  1.4.2          βœ” dplyr   0.7.5     
## βœ” tidyr   0.8.1          βœ” stringr 1.3.1     
## βœ” readr   1.1.1          βœ” forcats 0.3.0

## ── Conflicts ─────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## βœ– dplyr::filter() masks stats::filter()
## βœ– dplyr::lag()    masks stats::lag()
fun_topo <- res %>%
  filter(type == "importFrom") %>%
  group_by(origin, dep) %>%
  summarise(exported_fun = n()) %>%
  ungroup()

fun_topo %>%
  arrange(desc(exported_fun))
## # A tibble: 6,759 x 3
##    origin           dep        exported_fun
##                             
##  1 aroma.affymetrix aroma.core          135
##  2 aroma.cn         aroma.core          116
##  3 aroma.core       R.filesets          106
##  4 bibliometrix     Matrix              105
##  5 DescTools        stats                94
##  6 aroma.affymetrix R.utils              90
##  7 aroma.core       R.utils              77
##  8 DiagrammeR       igraph               64
##  9 caret            stats                51
## 10 aroma.affymetrix R.filesets           50
## # ... with 6,749 more rows
fun_topo %>%
  filter(exported_fun >= 50) %>%
  arrange(desc(exported_fun)) 
## # A tibble: 11 x 3
##    origin           dep        exported_fun
##                             
##  1 aroma.affymetrix aroma.core          135
##  2 aroma.cn         aroma.core          116
##  3 aroma.core       R.filesets          106
##  4 bibliometrix     Matrix              105
##  5 DescTools        stats                94
##  6 aroma.affymetrix R.utils              90
##  7 aroma.core       R.utils              77
##  8 DiagrammeR       igraph               64
##  9 caret            stats                51
## 10 aroma.affymetrix R.filesets           50
## 11 circular         stats                50
fun_topo %>%
  filter(exported_fun < 50) %>%
  ungroup() %>%
  count(exported_fun) %>%
  ggplot() + 
  aes(exported_fun, n) + 
  geom_col()

just_one <- fun_topo %>%
  filter(exported_fun == 1) %>%
  left_join(res)
## Joining, by = c("origin", "dep")
just_one %>% 
  count(dep, fun, sort = TRUE)
## # A tibble: 1,206 x 3
##    dep       fun           n
##              
##  1 Rcpp      evalCpp     129
##  2 magrittr  %>%          75
##  3 methods   is           66
##  4 Rcpp      sourceCpp    54
##  5 graphics  plot         50
##  6 utils     combn        38
##  7 methods   new          34
##  8 utils     tail         30
##  9 MASS      mvrnorm      29
## 10 grDevices rgb          28
## # ... with 1,196 more rows

To leave a comment for the author, please follow the link and comment on their blog: Colin Fay.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)