Visualizing rOpenSci collaboration

[This article was first published on Recology - R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

We (rOpenSci) have been writing code for R packages for a couple years, so it is time to take a look back at the data. What data you ask? The commits data from GitHub ~ data that records who did what and when.

Using the Github commits API we can gather data on who commited code to a Github repository, and when they did it. Then we can visualize this hitorical record.


Install some functions for interacting with the Github API via R

install_github('sandbox', 'ropensci') 

library(sandbox)
library(httr)
library(ggplot2)
library(scales)
library(reshape2)
library(bipartite)
library(doMC)
library(plyr)
library(ggthemes)
library(picante)

# And authenticate - pops open a page in your default browser, then tells 
# you authentication was successful
github_auth()

Get all repos for an organization, here ropensci of course

ropensci_repos <- github_allrepos(userorg = "ropensci")

Get commits broken down in to additions and deletions, though below we just collapse them to all commits

registerDoMC(cores = 4)
github_commits_safe <- plyr::failwith(NULL, github_commits)
out <- llply(ropensci_repos, function(x) github_commits_safe("ropensci", x, 
    since = "2009-01-01T", limit = 500), .parallel = TRUE)
names(out) <- ropensci_repos
out2 <- compact(out)
outdf <- ldply(out2)

Plot commits by date and repo

outdf_subset <- outdf[!outdf$.id %in% c("citeulike", "challenge", "docs", "ropensci-book", 
    "usecases", "textmine", "usgs", "ropenscitoolkit", "neotoma", "rEWDB", "rgauges", 
    "rodash", "ropensci.github.com", "ROAuth"), ]
outdf_subset$.id <- tolower(outdf_subset$.id)
outdf_subset <- ddply(outdf_subset, .(.id, date), summarise, value = sum(value))

mindates <- llply(unique(outdf_subset$.id), function(x) min(outdf_subset[outdf_subset$.id == 
    x, "date"]))
names(mindates) <- unique(outdf_subset$.id)
mindates <- sort(do.call(c, mindates))
outdf_subset$.id <- factor(outdf_subset$.id, levels = names(mindates))

ggplot(outdf_subset, aes(date, value, fill = .id)) + 
    geom_bar(stat = "identity", width = 0.5) + 
    geom_rangeframe(sides = "b", colour = "grey") + 
    theme_bw(base_size = 9) + 
    scale_x_date(labels = date_format("%Y"), breaks = date_breaks("year")) + 
    scale_y_log10() + 
    facet_grid(.id ~ .) + 
    labs(x = "", y = "") + 
    theme(axis.text.y = element_blank(), 
        axis.text.x = element_text(colour = "black"), 
        axis.ticks.y = element_blank(), 
        strip.text.y = element_text(angle = 0, size = 8, ), 
        strip.background = element_rect(size = 0), 
        panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(), 
        legend.text = element_text(size = 8), 
        legend.position = "none", 
        panel.border = element_blank())

center

The plot above plots the sum of additions+deletions, and is sorted by the first commit date of reach repo, with the first being treebase, which wraps the Treebase API, and the most recent being rwbclimate, which wraps the World Blank climate data API.

You can see that some repos have recieved commits more or less consistently over their life time, while others have seen a little development here and there.


w

In addition, there are quite a few people that have committed code now to rOpenSci repos, calling for a network vizualization of course.

outdf_network <- droplevels(outdf[!outdf$.id %in% c("citeulike", "challenge", 
    "docs", "ropensci-book", "usecases", "textmine", "usgs", "ropenscitoolkit", 
    "retriever", "rodash", "ropensci.github.com", "ROAuth", "rgauges", "sandbox", 
    "rfna", "rmetadata", "rhindawi", "rpmc", "rpensoft", "ritis"), ])
casted <- dcast(outdf_network, .id + date + name ~ variable, fun.aggregate = length, 
    value.var = "value")
names(casted)[1] <- "repo"
casted2 <- ddply(casted, .(repo, name), summarise, commits = sum(additions))
casted2 <- data.frame(repo = casted2$repo, weight = casted2$commits, name = casted2$name)
mat <- sample2matrix(casted2)
plotweb(sortweb(mat, sort.order = "dec"), method = "normal", text.rot = 90, 
    adj.high = c(-0.3, 0), adj.low = c(1, -0.3), y.width.low = 0.05, y.width.high = 0.05, 
    ybig = 0.09, labsize = 0.7)

center

The plot above shows repos on one side and contributors on the other. Some folks (the core rOpenSci team: cboettig, karthikram, emhart, and schamberlain) have committed quite a lot to many packages. We also have amny awesome contributors to our packages (some contributors and repos have been removed for clarity).

rOpenSci is truly a collaborative effort to develop tools for open science, so thanks to all our contributors - keep on forking, pull requesting, and commiting.

To leave a comment for the author, please follow the link and comment on their blog: Recology - R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)