#5: Easy package information
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Welcome to the fifth post in the recklessly rambling R rants series, or R4 for short.
The third post showed an easy way to follow R development by monitoring (curated) changes on the NEWS file for the development version r-devel. As a concrete example, I mentioned that it has shown a nice new function (tools::CRAN_package_db()) coming up in R 3.4.0. Today we will build on that.
Consider the following short snippet:
library(data.table)
getPkgInfo <- function() {
    if (exists("tools::CRAN_package_db")) {
        dat <- tools::CRAN_package_db()
    } else {
        tf <- tempfile()
        download.file("https://cloud.r-project.org/src/contrib/PACKAGES.rds", tf, quiet=TRUE)
        dat <- readRDS(tf)              # r-devel can now readRDS off a URL too
    }
    dat <- as.data.frame(dat)
    setDT(dat)
    dat
}
It defines a simple function getPkgInfo() as a wrapper around said new function from R 3.4.0, ie tools::CRAN_package_db(), and a fallback alternative using a tempfile (in the automagically cleaned R temp directory) and an explicit download and read of the underlying RDS file. As an aside, just this week the r-devel NEWS told us that such readRDS() operations can now read directly from URL connection. Very nice---as RDS is a fantastic file format when you are working in R.
Anyway, back to the RDS file! The snippet above returns a data.table object with as many rows as there are packages on CRAN, and basically all their (parsed !!) DESCRIPTION info and then some. A gold mine!
Consider this to see how many package have a dependency (in the sense of Depends, Imports or LinkingTo, but not Suggests because Suggests != Depends) on Rcpp:
R> dat <- getPkgInfo()
R> rcppRevDepInd <- as.integer(tools::dependsOnPkgs("Rcpp", recursive=FALSE, installed=dat))
R> length(rcppRevDepInd)
[1] 998
R>
So exciting---we will hit 1000 within days! But let's do some more analysis:
R> dat[ rcppRevDepInd, RcppRevDep := TRUE]  # set to TRUE for given set
R> dat[ RcppRevDep==TRUE, 1:2]
           Package Version
  1:      ABCoptim  0.14.0
  2: AbsFilterGSEA     1.5
  3:           acc   1.3.3
  4: accelerometry   2.2.5
  5:      acebayes   1.3.4
 ---                      
994:        yakmoR   0.1.1
995:  yCrypticRNAs  0.99.2
996:         yuima   1.5.9
997:           zic     0.9
998:       ziphsmm   1.0.4
R>
Here we index the reverse dependency using the vector we had just computed, and then that new variable to subset the data.table object. Given the aforementioned parsed information from all the DESCRIPTION files, we can learn more:
R> ## likely false entries
R> dat[ RcppRevDep==TRUE, ][NeedsCompilation!="yes", c(1:2,4)]
            Package Version                                                                         Depends
 1:         baitmet   1.0.0                                                           Rcpp, erah (>= 1.0.5)
 2:           bea.R   1.0.1                                                        R (>= 3.2.1), data.table
 3:            brms   1.6.0                     R (>= 3.2.0), Rcpp (>= 0.12.0), ggplot2 (>= 2.0.0), methods
 4: classifierplots   1.3.3                             R (>= 3.1), ggplot2 (>= 2.2), data.table (>= 1.10),
 5:           ctsem   2.3.1                                           R (>= 3.2.0), OpenMx (>= 2.3.0), Rcpp
 6:        DeLorean   1.2.4                                                  R (>= 3.0.2), Rcpp (>= 0.12.0)
 7:            erah   1.0.5                                                               R (>= 2.10), Rcpp
 8:             GxM     1.1                                                                              NA
 9:             hmi   0.6.3                                                                    R (>= 3.0.0)
10:        humarray     1.1 R (>= 3.2), NCmisc (>= 1.1.4), IRanges (>= 1.22.10),\nGenomicRanges (>= 1.16.4)
11:         iNextPD   0.3.2                                                                    R (>= 3.1.2)
12:          joinXL   1.0.1                                                                    R (>= 3.3.1)
13:            mafs   0.0.2                                                                              NA
14:            mlxR   3.1.0                                                           R (>= 3.0.1), ggplot2
15:    RmixmodCombi     1.0              R(>= 3.0.2), Rmixmod(>= 2.0.1), Rcpp(>= 0.8.0), methods,\ngraphics
16:             rrr   1.0.0                                                                    R (>= 3.2.0)
17:        UncerIn2     2.0                          R (>= 3.0.0), sp, RandomFields, automap, fields, gstat
R> 
There are a full seventeen packages which claim to depend on Rcpp while not having any compiled code of their own. That is likely false---but I keep them in my counts, however relunctantly. A CRAN-declared Depends: is a Depends:, after all.
Another nice thing to look at is the total number of package that declare that they need compilation:
R> ## number of packages with compiled code R> dat[ , .(N=.N), by=NeedsCompilation] NeedsCompilation N 1: no 7625 2: yes 2832 3: No 1 R>
Isn't that awesome? It is 2832 out of (currently) 10458, or about 27.1%. Just over one in four. Now the 998 for Rcpp look even better as they are about 35% of all such packages. In order words, a little over one third of all packages with compiled code (which may be legacy C, Fortran or C++) use Rcpp. Wow.
Before closing, one shoutout to Dirk Schumacher whose thankr which I made the center of the last post is now on CRAN. As a mighty fine and slim micropackage without external dependencies. Neat.
This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
