Uncovering the relationships among functions in a package

[This article was first published on R on carl b frederick, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I maintain a few internal R packages that we have at work. Recently, and perhaps foolishly, I decided that I needed to totally refactor one of them to add some more bells and whistles. A substantial part of this task involved simplifying the internal workflow of the package. I needed to be sure not to omit any of the previous functionality, but a simple list of all the exported and unexported functions was not sufficient.1

So, I decided to map the dependencies between all of the internal functions (e.g. which functions called which). I found it useful, so I thought I’d share a function I wrote that does just that: visualizes the relationships among functions in a package. The functions are available at this gist.

For the sake of this post, I will use the merTools package that I have had the pleasure of working on with a great co-author. The end results are:

suppressMessages(
  devtools::source_gist("b30d861ea80a27fad4e44623c41e0170", filename = "packageFunctionMap.R")  
)
fake <- plotFcnDependencies("merTools")
fake %>% select_nodes_by_degree(expressions = "deg > 0") %>% transform_to_subgraph_ws %>% render_graph

This diagram shows which of the merTools functions call other merTools functions. Well, technically it shows all of the functions that explicitly call other functions. It does not include dependencies that are based on the output of other functions and/or functions that do not call other internal functions. The latter would have degree = 0 in social network terms.

Columns to the left are upstream and columns to the right are downstream. For example REimpact() calls expectedRank() and predictInterval(). predictInterval() then goes on to call three more functions, etc.

The practical value of this exercise comes from the following sorts of insights:

  • stripAttributes() is called by both averageObs() and randomObs(), don’t forget to include that functionality in whatever function(s) replace those two; and
  • fastdisp() is the only function that calls easyVarCorr(), perhaps easyVarCorr() is a good candidate to be rolled into fastdisp() if there are no other use cases for this function.

Stepping through the functions

The first function, ls_fcns(), retrieves the names of the functions from a namespace (using lsf.str) and returns it as a character vector.

ls_fcns <- function(pkg) {
  fcns <- unclass(lsf.str(envir = asNamespace(pkg), all = TRUE))
  return(as.character(fcns))
}

ls_fcns("merTools")
##  [1] "averageObs"         "bglmerModList"      "blmerModList"      
##  [4] "buildModelMatrix"   "collapseFrame"      "draw"              
##  [7] "draw.merMod"        "easyVarCorr"        "expectedRank"      
## [10] "fastdisp"           "FEsim"              "findFormFuns"      
## [13] "formulaBuild"       "glmerModList"       "ICC"               
## [16] "levelfun"           "lmerModList"        "mkNewReTrms"       
## [19] "modelFixedEff"      "modelInfo"          "modelRandEffStats" 
## [22] "plotFEsim"          "plotREsim"          "predictInterval"   
## [25] "print.merModList"   "randomObs"          "REcorrExtract"     
## [28] "REextract"          "REimpact"           "reOnly"            
## [31] "REquantile"         "REsdExtract"        "residDF.merMod"    
## [34] "REsim"              "reTermCount"        "reTermNames"       
## [37] "RHSForm"            "RMSE.merMod"        "safeDeparse"       
## [40] "sanitizeNames"      "setup_parallel"     "shinyMer"          
## [43] "shuffle"            "stripAttributes"    "subBoot"           
## [46] "subsetList"         "superFactor"        "thetaExtract"      
## [49] "trimModelFrame"     "VarCorr.merModList" "wiggle"            
## [52] "zzz"

The second function, fcn_deps(), cycles through each function to and does a regular expression search to see which, if any, of the other functions from that namespace are called. This part could be written more efficiently, I’m sure, but I was starting to feel bad about procrastinating too much. Feel free to suggest improvements in the comments or via twitter – @carlbfrederick.

fcn_deps <- function(pkg) {
  fcns <- ls_fcns(pkg)
  out <- data.frame(Function = as.character(),
                    Dependency_Function = as.character(),
                    Number_Calls = as.integer())
  for (i in fcns) {
    this_fcn <- capture.output(getAnywhere(i))
    for (j in fcns[-grep(i, fcns)]) {
      dep_fcns <- grep(paste(j, "\\(", sep=""), this_fcn)
      if (length(dep_fcns > 0)) {
        out <- rbind(out,
                     data.frame(Function = i,
                                Dependency_Function = j,
                                Number_Calls = length(dep_fcns)))
      }
    }
  }
  return(out)
}

fcn_deps("merTools") %>% as_data_frame
## # A tibble: 22 x 3
##    Function         Dependency_Function Number_Calls
##  * <fct>            <fct>                      <int>
##  1 averageObs       findFormFuns                   1
##  2 averageObs       REquantile                     1
##  3 averageObs       stripAttributes                1
##  4 averageObs       subsetList                     2
##  5 averageObs       superFactor                    1
##  6 buildModelMatrix mkNewReTrms                    1
##  7 buildModelMatrix reOnly                         1
##  8 buildModelMatrix RHSForm                        1
##  9 draw.merMod      averageObs                     1
## 10 draw.merMod      randomObs                      1
## # ... with 12 more rows

As you can see, fcn_deps() returns a data.frame with three columns: Function, Dependency_Function and Number_Calls. The first two are used to create the diagram. I don’t recall what I originally intended to do with Number_Calls, but it is still there.

The third function, plotFcnDependencies(), produces the DiagrammeR [@Ianone]
object that can be plotted. It is definitely “hacky” and will throw some errors that I haven’t quite ironed out yet, hence the try() in there.

The for loop is used to place the functions horizontally in the diagram above by calculating the longest of the shortest paths from each function to all the others. For example, the longest shortest paths between draw.MerMod() and the rest of the functions is 4. I call this quantity depth in the function.

Using this depth quantity, the rest of the function just creates the node and edge data.frames and then creates the diagram using the DiagrammeR package … and voila!

plotFcnDependencies <- function(pkg) {
  fcns <- ls_fcns(pkg)
  depFcn <- fcn_deps(pkg)

  depth <- NULL

  for (i in fcns) {
    try(suppressWarnings(dist <- max(sapply(igraph::shortest_paths(igraph::graph_from_data_frame(depFcn[,1:2]), from = i)$vpath, length))))
    depth <- c(depth, dist)
    dist <- 0
  }

  nodes <- data.frame(nodes = fcns,
                      type = "",
                      label = htmltools::htmlEscape(fcns),
                      depth = depth)

  nodes <- dplyr::arrange(nodes, desc(depth))


  out <- DiagrammeR::create_graph(
    nodes_df = nodes,
    edges_df = DiagrammeR::create_edges(
      from = depFcn$Function,
      to = depFcn$Dependency_Function,
      rel = "leading_to"
    ),
    graph_name = paste(pkg, " (version ", packageVersion(pkg), ") Function Map", sep=""),
    graph_attrs = c("layout = dot", "rankdir = LR"),
    node_attrs = "fontsize = 20"
  )

  return(out)
}

Advice

Many packages have a lot of functions. Mapping an entire package often results in an unreadable network diagram. Go ahead an make a diagram of dplyr once and you will see what I mean.

Ideas for further enhancements

  1. Differentiate exported functions from unexported functions.
  2. Add an argument to focus on certain function(s) to focus the diagram on the area of interest.

Thanks for reading! Feel free to join the discussion below.

devtools::session_info()
## Session info --------------------------------------------------------------
##  setting  value                       
##  version  R version 3.3.2 (2016-10-31)
##  system   x86_64, darwin13.4.0        
##  ui       X11                         
##  language (EN)                        
##  collate  en_US.UTF-8                 
##  tz       <NA>                        
##  date     2018-03-03
## Packages ------------------------------------------------------------------
##  package      * version  date      
##  abind          1.4-5    2016-07-21
##  arm            1.9-3    2016-11-27
##  assertthat     0.2.0    2017-04-11
##  backports      1.1.2    2017-12-13
##  bindr          0.1      2016-11-13
##  bindrcpp     * 0.2      2017-06-17
##  blme           1.0-4    2015-06-14
##  blogdown       0.5      2018-01-28
##  bookdown       0.5      2017-08-20
##  brew           1.0-6    2011-04-13
##  broom          0.4.2    2017-02-13
##  cellranger     1.1.0    2016-07-27
##  cli            1.0.0    2017-11-05
##  coda           0.19-1   2016-12-08
##  colorspace     1.3-2    2016-12-14
##  crayon         1.3.4    2017-09-16
##  curl           3.1      2018-01-30
##  devtools       1.12.0   2016-06-24
##  DiagrammeR   * 0.9.3    2018-01-30
##  digest         0.6.12   2017-01-27
##  downloader     0.4      2015-07-09
##  dplyr        * 0.7.4    2017-09-28
##  DT             0.2      2016-08-09
##  evaluate       0.10.1   2017-06-24
##  forcats      * 0.2.0    2017-01-23
##  foreign        0.8-67   2016-09-13
##  ggplot2      * 2.2.1    2016-12-30
##  glue           1.2.0    2017-10-29
##  gridExtra      2.3      2017-09-09
##  gtable         0.2.0    2016-02-26
##  haven          1.1.0    2017-07-09
##  hms            0.3      2016-11-22
##  htmltools      0.3.6    2017-04-28
##  htmlwidgets    1.0      2018-01-20
##  httpuv         1.3.5    2017-07-04
##  httr           1.3.1    2017-08-20
##  igraph         1.1.2    2017-07-21
##  influenceR     0.1.0    2015-09-03
##  jsonlite       1.5      2017-06-01
##  knitr          1.18.10  2018-01-28
##  lattice        0.20-34  2016-09-06
##  lazyeval       0.2.0    2016-06-12
##  lme4           1.1-12   2016-04-16
##  lubridate      1.7.1    2017-11-03
##  magrittr       1.5      2014-11-22
##  MASS           7.3-45   2016-04-21
##  Matrix         1.2-7.1  2016-09-01
##  memoise        1.0.0    2016-01-29
##  merTools       0.3.0    2016-12-12
##  mime           0.5      2016-07-07
##  minqa          1.2.4    2014-10-09
##  mnormt         1.5-5    2016-10-15
##  modelr         0.1.1    2017-07-24
##  munsell        0.4.3    2016-02-13
##  mvtnorm        1.0-5    2016-02-02
##  nlme           3.1-128  2016-05-10
##  nloptr         1.0.4    2014-08-04
##  pillar         1.1.0    2018-01-14
##  pkgconfig      2.0.1    2017-03-21
##  plyr           1.8.4    2016-06-08
##  psych          1.7.3.21 2017-03-22
##  purrr        * 0.2.4    2017-10-18
##  R6             2.2.2    2017-06-17
##  RColorBrewer   1.1-2    2014-12-07
##  Rcpp           0.12.13  2017-09-28
##  readr        * 1.1.1    2017-05-16
##  readxl         1.0.0    2017-04-18
##  reshape2       1.4.2    2016-10-22
##  rgexf          0.15.3   2015-03-24
##  rlang          0.1.6    2017-12-21
##  rmarkdown      1.8      2017-11-17
##  Rook           1.1-1    2014-10-20
##  rprojroot      1.3-2    2018-01-03
##  rstudioapi     0.7      2017-09-07
##  rvest          0.3.2    2016-06-17
##  scales         0.5.0    2017-08-24
##  shiny          1.0.5    2017-08-23
##  stringi        1.1.5    2017-04-07
##  stringr      * 1.2.0    2017-02-18
##  tibble       * 1.4.1    2017-12-25
##  tidyr        * 0.7.2    2017-10-16
##  tidyverse    * 1.2.1    2017-11-14
##  utf8           1.1.3    2018-01-03
##  viridis        0.4.1    2018-01-08
##  viridisLite    0.2.0    2017-03-24
##  visNetwork     2.0.3    2018-01-09
##  withr          1.0.2    2016-06-20
##  xfun           0.1      2018-01-22
##  XML            3.98-1.5 2016-11-10
##  xml2           1.1.1    2017-01-24
##  xtable         1.8-2    2016-02-05
##  yaml           2.1.16   2017-12-12
##  source                                  
##  CRAN (R 3.3.0)                          
##  CRAN (R 3.3.2)                          
##  cran (@0.2.0)                           
##  cran (@1.1.2)                           
##  cran (@0.1)                             
##  cran (@0.2)                             
##  CRAN (R 3.3.0)                          
##  Github (rstudio/blogdown@aa98b32)       
##  cran (@0.5)                             
##  CRAN (R 3.3.0)                          
##  cran (@0.4.2)                           
##  CRAN (R 3.3.0)                          
##  CRAN (R 3.3.2)                          
##  CRAN (R 3.3.2)                          
##  CRAN (R 3.3.2)                          
##  CRAN (R 3.3.2)                          
##  local                                   
##  CRAN (R 3.3.0)                          
##  Github (rich-iannone/DiagrammeR@9d6a8e2)
##  cran (@0.6.12)                          
##  CRAN (R 3.3.0)                          
##  cran (@0.7.4)                           
##  CRAN (R 3.3.0)                          
##  cran (@0.10.1)                          
##  CRAN (R 3.3.2)                          
##  CRAN (R 3.3.2)                          
##  cran (@2.2.1)                           
##  CRAN (R 3.3.2)                          
##  cran (@2.3)                             
##  CRAN (R 3.3.0)                          
##  CRAN (R 3.3.2)                          
##  CRAN (R 3.3.2)                          
##  cran (@0.3.6)                           
##  CRAN (R 3.3.2)                          
##  cran (@1.3.5)                           
##  CRAN (R 3.3.2)                          
##  CRAN (R 3.3.2)                          
##  CRAN (R 3.3.0)                          
##  cran (@1.5)                             
##  Github (yihui/knitr@1bdaf39)            
##  CRAN (R 3.3.2)                          
##  CRAN (R 3.3.0)                          
##  CRAN (R 3.3.0)                          
##  CRAN (R 3.3.2)                          
##  CRAN (R 3.3.0)                          
##  CRAN (R 3.3.2)                          
##  CRAN (R 3.3.2)                          
##  CRAN (R 3.3.0)                          
##  CRAN (R 3.3.2)                          
##  CRAN (R 3.3.0)                          
##  CRAN (R 3.3.0)                          
##  CRAN (R 3.3.0)                          
##  CRAN (R 3.3.2)                          
##  CRAN (R 3.3.0)                          
##  CRAN (R 3.3.0)                          
##  CRAN (R 3.3.2)                          
##  CRAN (R 3.3.0)                          
##  CRAN (R 3.3.2)                          
##  cran (@2.0.1)                           
##  CRAN (R 3.3.0)                          
##  cran (@1.7.3.2)                         
##  CRAN (R 3.3.2)                          
##  cran (@2.2.2)                           
##  CRAN (R 3.3.0)                          
##  cran (@0.12.13)                         
##  CRAN (R 3.3.2)                          
##  CRAN (R 3.3.2)                          
##  CRAN (R 3.3.0)                          
##  CRAN (R 3.3.0)                          
##  CRAN (R 3.3.2)                          
##  cran (@1.8)                             
##  CRAN (R 3.3.0)                          
##  cran (@1.3-2)                           
##  CRAN (R 3.3.2)                          
##  CRAN (R 3.3.0)                          
##  cran (@0.5.0)                           
##  cran (@1.0.5)                           
##  cran (@1.1.5)                           
##  cran (@1.2.0)                           
##  CRAN (R 3.3.2)                          
##  CRAN (R 3.3.2)                          
##  CRAN (R 3.3.2)                          
##  CRAN (R 3.3.2)                          
##  CRAN (R 3.3.2)                          
##  cran (@0.2.0)                           
##  CRAN (R 3.3.2)                          
##  CRAN (R 3.3.0)                          
##  CRAN (R 3.3.2)                          
##  CRAN (R 3.3.2)                          
##  CRAN (R 3.3.2)                          
##  CRAN (R 3.3.0)                          
##  cran (@2.1.16)

  1. Also developing this function was a fun way to put off doing the hard work.

To leave a comment for the author, please follow the link and comment on their blog: R on carl b frederick.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)