**R Views**, and kindly contributed to R-bloggers)

In a previous post, I showed a very simple example of using the R function `tools::CRAN_package_db()`

to analyze information about CRAN packages. `CRAN_package_db()`

extracts the metadata CRAN stores on all of its 12,000 plus packages and arranges it into a “database”, actually a complicated data frame in which some columns have vectors or lists as entries.

It’s simple to run the function and it doesn’t take very long on my Mac Book Air.

`p_db <- tools::CRAN_package_db()`

The following gives some insight into what’s contained in the data frame.

`dim(p_db)`

`## [1] 12635 65`

`matrix(names(p_db),ncol=2)`

```
## [,1] [,2]
## [1,] "Package" "Collate.windows"
## [2,] "Version" "Contact"
## [3,] "Priority" "Copyright"
## [4,] "Depends" "Date"
## [5,] "Imports" "Description"
## [6,] "LinkingTo" "Encoding"
## [7,] "Suggests" "KeepSource"
## [8,] "Enhances" "Language"
## [9,] "License" "LazyData"
## [10,] "License_is_FOSS" "LazyDataCompression"
## [11,] "License_restricts_use" "LazyLoad"
## [12,] "OS_type" "MailingList"
## [13,] "Archs" "Maintainer"
## [14,] "MD5sum" "Note"
## [15,] "NeedsCompilation" "Packaged"
## [16,] "Additional_repositories" "RdMacros"
## [17,] "Author" "SysDataCompression"
## [18,] "[email protected]" "SystemRequirements"
## [19,] "Biarch" "Title"
## [20,] "BugReports" "Type"
## [21,] "BuildKeepEmpty" "URL"
## [22,] "BuildManual" "VignetteBuilder"
## [23,] "BuildResaveData" "ZipData"
## [24,] "BuildVignettes" "Published"
## [25,] "Built" "Path"
## [26,] "ByteCompile" "X-CRAN-Comment"
## [27,] "Classification/ACM" "Reverse depends"
## [28,] "Classification/ACM-2012" "Reverse imports"
## [29,] "Classification/JEL" "Reverse linking to"
## [30,] "Classification/MSC" "Reverse suggests"
## [31,] "Classification/MSC-2010" "Reverse enhances"
## [32,] "Collate" "MD5sum"
## [33,] "Collate.unix" "Package"
```

Looking at a few rows and columns gives a feel for how complicated its structure is.

`p_db[1:10, c(1,2,4,5)]`

```
## Package Version Depends
## 1 A3 1.0.0 R (>= 2.15.0), xtable, pbapply
## 2 abbyyR 0.5.4 R (>= 3.2.0)
## 3 abc 2.1 R (>= 2.10), abc.data, nnet, quantreg, MASS, locfit
## 4 abc.data 1.0 R (>= 2.10)
## 5 ABC.RAP 0.9.0 R (>= 3.1.0)
## 6 ABCanalysis 1.2.1 R (>= 2.10)
## 7 abcdeFBA 0.4 Rglpk,rgl,corrplot,lattice,R (>= 2.10)
## 8 ABCoptim 0.15.0
```
## 9 ABCp2 1.2 MASS
## 10 abcrf 1.7 R(>= 3.1)
## Imports
## 1
## 2 httr, XML, curl, readr, plyr, progress
## 3
## 4
## 5 graphics, stats, utils
## 6 plotrix
## 7
## 8 Rcpp, graphics, stats, utils
## 9
## 10 readr, MASS, matrixStats, ranger, parallel, stringr, Rcpp (>=\n0.11.2)

So, having spent a little time leaning how vexing working with this data can be, I was delighted when I discovered Ioannis Kosmidis’ `cranly`

package during my March “Top 40” review. `cranly`

is a very impressive package, built along tidy principles, that is helpful for learning about individual packages, analyzing the structure of package and author relationships, and searching for packages.

```
library(cranly)
library(tidyverse)
```

`## ── Attaching packages ──────────────────────────────────────────────────── tidyverse 1.2.1 ──`

```
## ✔ ggplot2 2.2.1 ✔ purrr 0.2.4
## ✔ tibble 1.4.2 ✔ dplyr 0.7.5
## ✔ tidyr 0.8.1 ✔ stringr 1.3.1
## ✔ readr 1.1.1 ✔ forcats 0.3.0
```

```
## ── Conflicts ─────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
```

The first really impressive feature is a “one button” clean function that does an amazing job of getting the data in shape to work with. In my preliminary work, I struggled just to get the author data clean. In the approach that I took, getting rid of text like [aut, cre] to get a count of authors took more regular expression work than I wanted to deal with. But `clean_CRAN_db`

does a good job of cleaning up the whole database. Note that the helper function `clean_up_author`

has a considerable number of hard-coded text strings that must have taken hours to get right.

`package_db <- clean_CRAN_db(p_db)`

Once you have the clean data, it is easy to run some pretty interesting analyses. This first example, straight out of the package vignette, builds the network of package relationships based on which packages import which, and then plots a summary for the top 20 most imported packages.

```
package_network <- build_network(package_db)
package_summaries <- summary(package_network)
plot(package_summaries, according_to = "n_imported_by", top = 20)
```

There is also a built-in function to compute the importance or relevance of a package using the page rank algorithm.

`plot(package_summaries, according_to = "page_rank", top = 20)`

The `build_network`

function also offers the opportunity to investigate the collaboration of package authors by building a network from the authors’ perspective.

`author_network <- build_network(object = package_db, perspective = "author")`

Here, we look at J.J. Allaire’s network. `exact = FALSE`

means that the algorithm is not using exact matching.

`plot(author_network, author = "JJ Allaire", exact = FALSE)`

It is also possible to study individual packages. Here, I plot the very simple dependency tree for the time series package `xts`

. There is a very good argument to be made that the simpler the dependency tree the more stable and reliable the package.

```
xts_tree <- build_dependence_tree(package_network, "xts")
plot(xts_tree)
```

As a final example, consider how the `package_with()`

function might be used to search for Bayesian packages by searching for packages with “Bayes” or “MCMC” in the description. I don’t believe that this exhausts the possibilities of `cranly`

, but it should be clear that the package is a very useful tool for looking into the mysteries of CRAN.

```
Bayesian_packages <- package_with(package_network, name = c("Bayes", "MCMC"))
plot(package_network, package = Bayesian_packages, legend=FALSE)
```

**leave a comment**for the author, please follow the link and comment on their blog:

**R Views**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...