**Thinking inside the box**, and kindly contributed to R-bloggers)

A few days ago, I

blogged about visualizing CRAN dependency ranks

which turned out to be a somewhat popular post. David Smith

followed-up at the REvo blog

suggesting to exclude packages already shipping with R (which is indicated by

their ‘Recommended’ priority). Good idea!

So here is an updated version, where we limit the display to the top twenty

packages counted by reverse ‘Depends:’, and excluding those already shipping

with R such as MASS,

lattice,

survival,

Matrix, or

nlme.

The mvtnorm package

is still out by a wide margin, but we can note that (cough, cough) our

Rcpp package for

seamless R and C++ is now tied for second with the

coda package for MCMC analysis.

Also of note is the fact that CRAN keeps growing relentlessly and moved from

3969 packages to 3981 packages in the space of these few days…

Lastly, I have been asked about the code and/or data behind this. It is

really pretty simply as the main `data.frame`

can be had from CRAN

(where I also found the initial few lines to load it). After that, one only

needs a little bit of subsetting as shown below. I look forward to seeing

other people riff on this data set.

#!/usr/bin/r ## ## Initial db downloand from http://developer.r-project.org/CRAN/Scripts/depends.R and adapted require("tools") ## this function is essentially the same as R Core's from the URL ## http://developer.r-project.org/CRAN/Scripts/depends.R getDB <- function() { contrib.url(getOption("repos")["CRAN"], "source") # trigger chooseCRANmirror() if required description <- sprintf("%s/web/packages/packages.rds", getOption("repos")["CRAN"]) con <- if(substring(description, 1L, 7L) == "file://") { file(description, "rb") } else { url(description, "rb") } on.exit(close(con)) db <- readRDS(gzcon(con)) rownames(db) <- db[,"Package"] db } db <- getDB() ## count packages getCounts <- function(db, col) { foo <- sapply(db[,col], function(s) { if (is.na(s)) NA else length(strsplit(s, ",")[[1]]) } ) } ## build a data.frame with the number of entries for reverse depends, reverse imports, ## reverse linkingto and reverse suggests; also keep Recommended status ddall <- data.frame(pkg=db[,1], RDepends=getCounts(db, "Reverse depends"), RImports=getCounts(db, "Reverse imports"), RLinkingTo=getCounts(db, "Reverse linking to"), RSuggests=getCounts(db, "Reverse suggests"), Recommended=db[,"Priority"]=="recommended" ) ## Subset to non-Recommended packages as in David Smith's follow-up post dd <- subset(ddall, is.na(ddall[,"Recommended"]) | ddall[,"Recommended"] != TRUE) labeltxt <- paste("Analysis as of", format(Sys.Date(), "%d %b %Y"), "covering", nrow(db), "total CRAN packages") cutOff <- 20 doPNG <- TRUE if (doPNG) png("/tmp/CRAN_ReverseDepends.png", width=600, heigh=600) z <- dd[head(order(dd[,2], decreasing=TRUE), cutOff),c(1,2)] dotchart(z[,2], labels=z[,1], cex=1, pch=19, main="CRAN Packages sorted by Reverse Depends:", sub=paste("Limited to top", cutOff, "packages, excluding 'Recommended' ones shipped with R"), xlab=labeltxt) if (doPNG) dev.off() if (doPNG) png("/tmp/CRAN_ReverseImports.png", width=600, heigh=600) z <- dd[head(order(dd[,3], decreasing=TRUE), cutOff),c(1,3)] dotchart(z[,2], labels=z[,1], cex=1, pch=19, main="CRAN Packages sorted by Reverse Imports:", sub=paste("Limited to top", cutOff, "packages, excluding 'Recommended' ones shipped with R"), xlab=labeltxt) if (doPNG) dev.off() # no cutOff but rather a na.omit if (doPNG) png("/tmp/CRAN_ReverseLinkingTo.png", width=600, heigh=600) z <- na.omit(dd[head(order(dd[,4], decreasing=TRUE), 30),c(1,4)]) dotchart(z[,2], labels=z[,1], pch=19, main="CRAN Packages sorted by Reverse LinkingTo:", xlab=labeltxt) if (doPNG) dev.off()

**leave a comment**for the author, please follow the link and comment on their blog:

**Thinking inside the box**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...