Visualizing the R packages galaxy

March 23, 2016

(This article was first published on Piece of K » R_EN, and kindly contributed to R-bloggers)

The idea of this post is to create a kind of map of the R ecosystem showing dependencies relationships between packages. This is actually quite simple to quickly generate a graph object like that, since all the data we need can be obtained with one call to the function available.packages(). There are a few additional steps to clean the data and arrange them, but with the following code everyone can generate a graph of dependencies in less than 5s.

First, we load the igraph package and we retrieve the data from our current repository.

dat <- available.packages()

For each package, we produce a character string with all its dependencies (Imports and Depends fields) separated with commas.

dat.imports <- paste(dat[, "Imports"], dat[, "Depends"], sep = ", ")
dat.imports <- as.list(dat.imports)
dat.imports <- lapply(dat.imports, function(x) gsub("\(([^\)]+)\)", "", x))
dat.imports <- lapply(dat.imports, function(x) gsub("n", "", x))
dat.imports <- lapply(dat.imports, function(x) gsub(" ", "", x))

Next step, we split the strings and we use the stack function to get the complete list of edges of the graph.

dat.imports <- sapply(dat.imports, function(x) strsplit(x, split = ","))
dat.imports <- lapply(dat.imports, function(x) x[!x %in% c("NA", "R")])
names(dat.imports) <- rownames(dat)
dat.imports <- stack(dat.imports)
dat.imports <- as.matrix(dat.imports)
dat.imports <- dat.imports[-which(dat.imports[, 1]==""), ]

Finally we create the graph with the list of edges. Here, I select the largest connected component because there are many isolated vertices which will make the graph harder to represent.

g <- graph.edgelist(dat.imports)
g <- decompose.graph(g)
g <- g[[which(sapply(g, vcount) == max(sapply(g, vcount)))]]

Now that we have the graph we can compute many graph-theory related statistics but here I just want to visualize these data. Plotting a large graph like that with igraph is possible but slow and tedious. I prefer to export the graph and open it in Gephi, a free and open-source software for network visualization.
The figure below was created with Gephi. Nodes are sized according to their number of incoming edges. This is pretty useless and uninformative, but still I like it. This looks like a sky map and it is great to feel that we, users and developers are part of this huge R galaxy.

R packages network

Click on the picture to see the HD version


To leave a comment for the author, please follow the link and comment on their blog: Piece of K » R_EN. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)