Uncovering the relationships among functions in a package
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I maintain a few internal R packages that we have at work. Recently, and perhaps foolishly, I decided that I needed to totally refactor one of them to add some more bells and whistles. A substantial part of this task involved simplifying the internal workflow of the package. I needed to be sure not to omit any of the previous functionality, but a simple list of all the exported and unexported functions was not sufficient.1
So, I decided to map the dependencies between all of the internal functions (e.g. which functions called which). I found it useful, so I thought I’d share a function I wrote that does just that: visualizes the relationships among functions in a package. The functions are available at this gist.
For the sake of this post, I will use the merTools
package that I have had the pleasure of working on with a great co-author. The end results are:
suppressMessages( devtools::source_gist("b30d861ea80a27fad4e44623c41e0170", filename = "packageFunctionMap.R") ) fake <- plotFcnDependencies("merTools") fake %>% select_nodes_by_degree(expressions = "deg > 0") %>% transform_to_subgraph_ws %>% render_graph
This diagram shows which of the merTools
functions call other merTools
functions. Well, technically it shows all of the functions that explicitly call other functions. It does not include dependencies that are based on the output of other functions and/or functions that do not call other internal functions. The latter would have degree = 0 in social network terms.
Columns to the left are upstream and columns to the right are downstream. For example REimpact()
calls expectedRank()
and predictInterval()
. predictInterval()
then goes on to call three more functions, etc.
The practical value of this exercise comes from the following sorts of insights:
stripAttributes()
is called by bothaverageObs()
andrandomObs()
, don’t forget to include that functionality in whatever function(s) replace those two; andfastdisp()
is the only function that callseasyVarCorr()
, perhapseasyVarCorr()
is a good candidate to be rolled intofastdisp()
if there are no other use cases for this function.
Stepping through the functions
The first function, ls_fcns()
, retrieves the names of the functions from a namespace (using lsf.str
) and returns it as a character vector.
ls_fcns <- function(pkg) { fcns <- unclass(lsf.str(envir = asNamespace(pkg), all = TRUE)) return(as.character(fcns)) } ls_fcns("merTools") ## [1] "averageObs" "bglmerModList" "blmerModList" ## [4] "buildModelMatrix" "collapseFrame" "draw" ## [7] "draw.merMod" "easyVarCorr" "expectedRank" ## [10] "fastdisp" "FEsim" "findFormFuns" ## [13] "formulaBuild" "glmerModList" "ICC" ## [16] "levelfun" "lmerModList" "mkNewReTrms" ## [19] "modelFixedEff" "modelInfo" "modelRandEffStats" ## [22] "plotFEsim" "plotREsim" "predictInterval" ## [25] "print.merModList" "randomObs" "REcorrExtract" ## [28] "REextract" "REimpact" "reOnly" ## [31] "REquantile" "REsdExtract" "residDF.merMod" ## [34] "REsim" "reTermCount" "reTermNames" ## [37] "RHSForm" "RMSE.merMod" "safeDeparse" ## [40] "sanitizeNames" "setup_parallel" "shinyMer" ## [43] "shuffle" "stripAttributes" "subBoot" ## [46] "subsetList" "superFactor" "thetaExtract" ## [49] "trimModelFrame" "VarCorr.merModList" "wiggle" ## [52] "zzz"
The second function, fcn_deps()
, cycles through each function to and does a regular expression search to see which, if any, of the other functions from that namespace are called. This part could be written more efficiently, I’m sure, but I was starting to feel bad about procrastinating too much. Feel free to suggest improvements in the comments or via twitter – @carlbfrederick.
fcn_deps <- function(pkg) { fcns <- ls_fcns(pkg) out <- data.frame(Function = as.character(), Dependency_Function = as.character(), Number_Calls = as.integer()) for (i in fcns) { this_fcn <- capture.output(getAnywhere(i)) for (j in fcns[-grep(i, fcns)]) { dep_fcns <- grep(paste(j, "\\(", sep=""), this_fcn) if (length(dep_fcns > 0)) { out <- rbind(out, data.frame(Function = i, Dependency_Function = j, Number_Calls = length(dep_fcns))) } } } return(out) } fcn_deps("merTools") %>% as_data_frame ## # A tibble: 22 x 3 ## Function Dependency_Function Number_Calls ## * <fct> <fct> <int> ## 1 averageObs findFormFuns 1 ## 2 averageObs REquantile 1 ## 3 averageObs stripAttributes 1 ## 4 averageObs subsetList 2 ## 5 averageObs superFactor 1 ## 6 buildModelMatrix mkNewReTrms 1 ## 7 buildModelMatrix reOnly 1 ## 8 buildModelMatrix RHSForm 1 ## 9 draw.merMod averageObs 1 ## 10 draw.merMod randomObs 1 ## # ... with 12 more rows
As you can see, fcn_deps()
returns a data.frame with three columns: Function, Dependency_Function and Number_Calls. The first two are used to create the diagram. I don’t recall what I originally intended to do with Number_Calls, but it is still there.
The third function, plotFcnDependencies()
, produces the DiagrammeR
[@Ianone]
object that can be plotted. It is definitely “hacky” and will throw some errors that I haven’t quite ironed out yet, hence the try()
in there.
The for loop is used to place the functions horizontally in the diagram above by calculating the longest of the shortest paths from each function to all the others. For example, the longest shortest paths between draw.MerMod()
and the rest of the functions is 4. I call this quantity depth in the function.
Using this depth quantity, the rest of the function just creates the node and edge data.frames and then creates the diagram using the DiagrammeR
package … and voila!
plotFcnDependencies <- function(pkg) { fcns <- ls_fcns(pkg) depFcn <- fcn_deps(pkg) depth <- NULL for (i in fcns) { try(suppressWarnings(dist <- max(sapply(igraph::shortest_paths(igraph::graph_from_data_frame(depFcn[,1:2]), from = i)$vpath, length)))) depth <- c(depth, dist) dist <- 0 } nodes <- data.frame(nodes = fcns, type = "", label = htmltools::htmlEscape(fcns), depth = depth) nodes <- dplyr::arrange(nodes, desc(depth)) out <- DiagrammeR::create_graph( nodes_df = nodes, edges_df = DiagrammeR::create_edges( from = depFcn$Function, to = depFcn$Dependency_Function, rel = "leading_to" ), graph_name = paste(pkg, " (version ", packageVersion(pkg), ") Function Map", sep=""), graph_attrs = c("layout = dot", "rankdir = LR"), node_attrs = "fontsize = 20" ) return(out) }
Advice
Many packages have a lot of functions. Mapping an entire package often results in an unreadable network diagram. Go ahead an make a diagram of dplyr
once and you will see what I mean.
Ideas for further enhancements
- Differentiate exported functions from unexported functions.
- Add an argument to focus on certain function(s) to focus the diagram on the area of interest.
Thanks for reading! Feel free to join the discussion below.
devtools::session_info() ## Session info -------------------------------------------------------------- ## setting value ## version R version 3.3.2 (2016-10-31) ## system x86_64, darwin13.4.0 ## ui X11 ## language (EN) ## collate en_US.UTF-8 ## tz <NA> ## date 2018-03-03 ## Packages ------------------------------------------------------------------ ## package * version date ## abind 1.4-5 2016-07-21 ## arm 1.9-3 2016-11-27 ## assertthat 0.2.0 2017-04-11 ## backports 1.1.2 2017-12-13 ## bindr 0.1 2016-11-13 ## bindrcpp * 0.2 2017-06-17 ## blme 1.0-4 2015-06-14 ## blogdown 0.5 2018-01-28 ## bookdown 0.5 2017-08-20 ## brew 1.0-6 2011-04-13 ## broom 0.4.2 2017-02-13 ## cellranger 1.1.0 2016-07-27 ## cli 1.0.0 2017-11-05 ## coda 0.19-1 2016-12-08 ## colorspace 1.3-2 2016-12-14 ## crayon 1.3.4 2017-09-16 ## curl 3.1 2018-01-30 ## devtools 1.12.0 2016-06-24 ## DiagrammeR * 0.9.3 2018-01-30 ## digest 0.6.12 2017-01-27 ## downloader 0.4 2015-07-09 ## dplyr * 0.7.4 2017-09-28 ## DT 0.2 2016-08-09 ## evaluate 0.10.1 2017-06-24 ## forcats * 0.2.0 2017-01-23 ## foreign 0.8-67 2016-09-13 ## ggplot2 * 2.2.1 2016-12-30 ## glue 1.2.0 2017-10-29 ## gridExtra 2.3 2017-09-09 ## gtable 0.2.0 2016-02-26 ## haven 1.1.0 2017-07-09 ## hms 0.3 2016-11-22 ## htmltools 0.3.6 2017-04-28 ## htmlwidgets 1.0 2018-01-20 ## httpuv 1.3.5 2017-07-04 ## httr 1.3.1 2017-08-20 ## igraph 1.1.2 2017-07-21 ## influenceR 0.1.0 2015-09-03 ## jsonlite 1.5 2017-06-01 ## knitr 1.18.10 2018-01-28 ## lattice 0.20-34 2016-09-06 ## lazyeval 0.2.0 2016-06-12 ## lme4 1.1-12 2016-04-16 ## lubridate 1.7.1 2017-11-03 ## magrittr 1.5 2014-11-22 ## MASS 7.3-45 2016-04-21 ## Matrix 1.2-7.1 2016-09-01 ## memoise 1.0.0 2016-01-29 ## merTools 0.3.0 2016-12-12 ## mime 0.5 2016-07-07 ## minqa 1.2.4 2014-10-09 ## mnormt 1.5-5 2016-10-15 ## modelr 0.1.1 2017-07-24 ## munsell 0.4.3 2016-02-13 ## mvtnorm 1.0-5 2016-02-02 ## nlme 3.1-128 2016-05-10 ## nloptr 1.0.4 2014-08-04 ## pillar 1.1.0 2018-01-14 ## pkgconfig 2.0.1 2017-03-21 ## plyr 1.8.4 2016-06-08 ## psych 1.7.3.21 2017-03-22 ## purrr * 0.2.4 2017-10-18 ## R6 2.2.2 2017-06-17 ## RColorBrewer 1.1-2 2014-12-07 ## Rcpp 0.12.13 2017-09-28 ## readr * 1.1.1 2017-05-16 ## readxl 1.0.0 2017-04-18 ## reshape2 1.4.2 2016-10-22 ## rgexf 0.15.3 2015-03-24 ## rlang 0.1.6 2017-12-21 ## rmarkdown 1.8 2017-11-17 ## Rook 1.1-1 2014-10-20 ## rprojroot 1.3-2 2018-01-03 ## rstudioapi 0.7 2017-09-07 ## rvest 0.3.2 2016-06-17 ## scales 0.5.0 2017-08-24 ## shiny 1.0.5 2017-08-23 ## stringi 1.1.5 2017-04-07 ## stringr * 1.2.0 2017-02-18 ## tibble * 1.4.1 2017-12-25 ## tidyr * 0.7.2 2017-10-16 ## tidyverse * 1.2.1 2017-11-14 ## utf8 1.1.3 2018-01-03 ## viridis 0.4.1 2018-01-08 ## viridisLite 0.2.0 2017-03-24 ## visNetwork 2.0.3 2018-01-09 ## withr 1.0.2 2016-06-20 ## xfun 0.1 2018-01-22 ## XML 3.98-1.5 2016-11-10 ## xml2 1.1.1 2017-01-24 ## xtable 1.8-2 2016-02-05 ## yaml 2.1.16 2017-12-12 ## source ## CRAN (R 3.3.0) ## CRAN (R 3.3.2) ## cran (@0.2.0) ## cran (@1.1.2) ## cran (@0.1) ## cran (@0.2) ## CRAN (R 3.3.0) ## Github (rstudio/blogdown@aa98b32) ## cran (@0.5) ## CRAN (R 3.3.0) ## cran (@0.4.2) ## CRAN (R 3.3.0) ## CRAN (R 3.3.2) ## CRAN (R 3.3.2) ## CRAN (R 3.3.2) ## CRAN (R 3.3.2) ## local ## CRAN (R 3.3.0) ## Github (rich-iannone/DiagrammeR@9d6a8e2) ## cran (@0.6.12) ## CRAN (R 3.3.0) ## cran (@0.7.4) ## CRAN (R 3.3.0) ## cran (@0.10.1) ## CRAN (R 3.3.2) ## CRAN (R 3.3.2) ## cran (@2.2.1) ## CRAN (R 3.3.2) ## cran (@2.3) ## CRAN (R 3.3.0) ## CRAN (R 3.3.2) ## CRAN (R 3.3.2) ## cran (@0.3.6) ## CRAN (R 3.3.2) ## cran (@1.3.5) ## CRAN (R 3.3.2) ## CRAN (R 3.3.2) ## CRAN (R 3.3.0) ## cran (@1.5) ## Github (yihui/knitr@1bdaf39) ## CRAN (R 3.3.2) ## CRAN (R 3.3.0) ## CRAN (R 3.3.0) ## CRAN (R 3.3.2) ## CRAN (R 3.3.0) ## CRAN (R 3.3.2) ## CRAN (R 3.3.2) ## CRAN (R 3.3.0) ## CRAN (R 3.3.2) ## CRAN (R 3.3.0) ## CRAN (R 3.3.0) ## CRAN (R 3.3.0) ## CRAN (R 3.3.2) ## CRAN (R 3.3.0) ## CRAN (R 3.3.0) ## CRAN (R 3.3.2) ## CRAN (R 3.3.0) ## CRAN (R 3.3.2) ## cran (@2.0.1) ## CRAN (R 3.3.0) ## cran (@1.7.3.2) ## CRAN (R 3.3.2) ## cran (@2.2.2) ## CRAN (R 3.3.0) ## cran (@0.12.13) ## CRAN (R 3.3.2) ## CRAN (R 3.3.2) ## CRAN (R 3.3.0) ## CRAN (R 3.3.0) ## CRAN (R 3.3.2) ## cran (@1.8) ## CRAN (R 3.3.0) ## cran (@1.3-2) ## CRAN (R 3.3.2) ## CRAN (R 3.3.0) ## cran (@0.5.0) ## cran (@1.0.5) ## cran (@1.1.5) ## cran (@1.2.0) ## CRAN (R 3.3.2) ## CRAN (R 3.3.2) ## CRAN (R 3.3.2) ## CRAN (R 3.3.2) ## CRAN (R 3.3.2) ## cran (@0.2.0) ## CRAN (R 3.3.2) ## CRAN (R 3.3.0) ## CRAN (R 3.3.2) ## CRAN (R 3.3.2) ## CRAN (R 3.3.2) ## CRAN (R 3.3.0) ## cran (@2.1.16)
Also developing this function was a fun way to put off doing the hard work.↩
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.