**John Myles White » Statistics**, and kindly contributed to R-bloggers)

Recently a few members of R Core have indicated that part of what slows down the development of R as a language is that it has become increasingly difficult over the years to achieve consensus among the core developers of the language. Inspired by these claims, I decided to look into this issue quantitatively by measuring the quantity of commits to R’s SVN repository that were made by each of the R Core developers. I wanted to know whether a small group of developers were overwhelmingly responsible for changes to R or whether all of the members of R Core had contributed equally. To follow along with what I did, you can grab the data and analysis scripts from GitHub.

First, I downloaded the R Core team’s SVN logs from http://developer.r-project.org/. I then used a simple regex to parse the SVN logs to count commits coming from each core committer.

After that, I tabulated the number of commits from each developer, pooling across the years 2003-2012 for which I had logs. You can see the results below, sorted by total commits in decreasing order:

Committer | Total Number of Commits |
---|---|

ripley | 22730 |

maechler | 3605 |

hornik | 3602 |

murdoch | 1978 |

pd | 1781 |

apache | 658 |

jmc | 599 |

luke | 576 |

urbaneks | 414 |

iacus | 382 |

murrell | 324 |

leisch | 274 |

tlumley | 153 |

rgentlem | 141 |

root | 87 |

duncan | 81 |

bates | 76 |

falcon | 45 |

deepayan | 40 |

plummer | 28 |

ligges | 24 |

martyn | 20 |

ihaka | 14 |

After that, I tried to visualize evolving trends over the years. First, I visualized the number of commits per developer per year:

And then I visualized the evenness of contributions from different developers by measuring the entropy of the distribution of commits on a yearly basis:

There seems to be some weak evidence that the community is either finding consensus more difficult and tending towards a single leader who makes final decisions or that some developers are progressively dropping out because of the difficulty of achieving consensus. There is unambiguous evidence that a single developer makes the overwhelming majority of commits to R’s SVN repo.

I leave it to others to understand what all of this means for R and for programming language communities in general.

**leave a comment**for the author, please follow the link and comment on their blog:

**John Myles White » Statistics**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...