Visualizing the R packages galaxy

[This article was first published on Piece of K » R_EN, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The idea of this post is to create a kind of map of the R ecosystem showing dependencies relationships between packages. This is actually quite simple to quickly generate a graph object like that, since all the data we need can be obtained with one call to the function available.packages(). There are a few additional steps to clean the data and arrange them, but with the following code everyone can generate a graph of dependencies in less than 5s.

First, we load the igraph package and we retrieve the data from our current repository.

library(igraph)
dat <- available.packages()

For each package, we produce a character string with all its dependencies (Imports and Depends fields) separated with commas.

dat.imports <- paste(dat[, "Imports"], dat[, "Depends"], sep = ", ")
dat.imports <- as.list(dat.imports)
dat.imports <- lapply(dat.imports, function(x) gsub("\(([^\)]+)\)", "", x))
dat.imports <- lapply(dat.imports, function(x) gsub("n", "", x))
dat.imports <- lapply(dat.imports, function(x) gsub(" ", "", x))

Next step, we split the strings and we use the stack function to get the complete list of edges of the graph.

dat.imports <- sapply(dat.imports, function(x) strsplit(x, split = ","))
dat.imports <- lapply(dat.imports, function(x) x[!x %in% c("NA", "R")])
names(dat.imports) <- rownames(dat)
dat.imports <- stack(dat.imports)
dat.imports <- as.matrix(dat.imports)
dat.imports <- dat.imports[-which(dat.imports[, 1]==""), ]

Finally we create the graph with the list of edges. Here, I select the largest connected component because there are many isolated vertices which will make the graph harder to represent.

g <- graph.edgelist(dat.imports)
g <- decompose.graph(g)
g <- g[[which(sapply(g, vcount) == max(sapply(g, vcount)))]]

Now that we have the graph we can compute many graph-theory related statistics but here I just want to visualize these data. Plotting a large graph like that with igraph is possible but slow and tedious. I prefer to export the graph and open it in Gephi, a free and open-source software for network visualization.
The figure below was created with Gephi. Nodes are sized according to their number of incoming edges. This is pretty useless and uninformative, but still I like it. This looks like a sky map and it is great to feel that we, users and developers are part of this huge R galaxy.

R packages network

Click on the picture to see the HD version

facebooktwitter

To leave a comment for the author, please follow the link and comment on their blog: Piece of K » R_EN.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)