The Social Dynamics of the R Core Team

[This article was first published on John Myles White » Statistics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Recently a few members of R Core have indicated that part of what slows down the development of R as a language is that it has become increasingly difficult over the years to achieve consensus among the core developers of the language. Inspired by these claims, I decided to look into this issue quantitatively by measuring the quantity of commits to R’s SVN repository that were made by each of the R Core developers. I wanted to know whether a small group of developers were overwhelmingly responsible for changes to R or whether all of the members of R Core had contributed equally. To follow along with what I did, you can grab the data and analysis scripts from GitHub.

First, I downloaded the R Core team’s SVN logs from http://developer.r-project.org/. I then used a simple regex to parse the SVN logs to count commits coming from each core committer.

After that, I tabulated the number of commits from each developer, pooling across the years 2003-2012 for which I had logs. You can see the results below, sorted by total commits in decreasing order:

Committer Total Number of Commits
ripley 22730
maechler 3605
hornik 3602
murdoch 1978
pd 1781
apache 658
jmc 599
luke 576
urbaneks 414
iacus 382
murrell 324
leisch 274
tlumley 153
rgentlem 141
root 87
duncan 81
bates 76
falcon 45
deepayan 40
plummer 28
ligges 24
martyn 20
ihaka 14

After that, I tried to visualize evolving trends over the years. First, I visualized the number of commits per developer per year:


Commits

And then I visualized the evenness of contributions from different developers by measuring the entropy of the distribution of commits on a yearly basis:


Entropy

There seems to be some weak evidence that the community is either finding consensus more difficult and tending towards a single leader who makes final decisions or that some developers are progressively dropping out because of the difficulty of achieving consensus. There is unambiguous evidence that a single developer makes the overwhelming majority of commits to R’s SVN repo.

I leave it to others to understand what all of this means for R and for programming language communities in general.

To leave a comment for the author, please follow the link and comment on their blog: John Myles White » Statistics.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)