How to Become an Efficient and Collaborative R Programmer

December 12, 2011
By

(This article was first published on Yihui Xie, and kindly contributed to R-bloggers)

I may want to add a subtitle "Why R-Forge Must Die" (thinking of Barry Rowlingson's talk earlier this year). I have been a GitHub user for two years, and I was mainly influenced by Hadley. Now I even feel a little bit addicted to GitHub (its slogan is "social coding"), because it is really convenient for collaboration and makes me more productive.

As some readers might have known, I started a new package knitr last month, and this time I decided to try to use the power of social network like Google+ and Twitter (something I used to stay away from), and so far I'm pretty satisfied with my attempts. I call GitHub the "Facebook" of programmers, and it is also very powerful to connect programmers and users. There are a few features that I think R programmers may want to try (I use knitr as an example):

1. Browse R code online: I hate checking out a whole package to read its source code, and R-Forge is clumsy (following the steps of sourceforge) in this aspect; the code on GitHub is highlighted and easy to thumb through. Besides, you can browse each commit and see what was changed (the difference is highlighted).
2. Issues: instead of writing your own TODO list which is often forgotten, both your users and you can create issues, and you can make discussions there; when you have got a fix, you can write a commit message like fixed #46 in GIT and the issue 46 will be automatically closed. This is super cool little feature to me. What is more, there will be a reference to the commit which fixed the issue, so you can come back in the future and see how the issue was fixed. Currently knitr has got 50 issues in total, with 24 of them from users.
3. Inline comments: you can discuss code directly along the lines; this is a super super cool feature. Ramnath started to contribute to the code theme feature recently after he forked my repository, and the original author (i.e. me) can go to the fork and check changes; each change can be commented, e.g. we had quite a few discussions in this commit. This feels like you can sit together with another programmer, and point to the code with a pen, saying "I like this and you may need to revise that, ...". In comparison, the traditional way of collaboration is usually through email -- email patches back and forth, which is way less straightforward. When Ramnath and I feel the work is mature, he can simply send me a pull request, and all the changes can be merged back to my repository. The other example is I saw the blog post by Songpants the other day, and I suggested he move the work to GitHub so I can make suggestions closer to the R code, and now the code is happily sitting on GitHub (so are my comments).
4. Wiki: it makes it so easy to quickly set up a documentation page; I have not done it for knitr yet, but I did it for the formatR package. It looks better than R's documentation, right? Again, other people can collaborate with you in editing the wiki pages. The other way to make your documentation better is to write vignettes in Sweave, which usually takes a lot of efforts (wrestling with LaTeX) unless you use LyX+knitr like me; I feel vignettes are easy to make, but this is another story.
5. Stay tuned with a package: you can watch a repository (use the button in the top-right) so that you can read the updates of a package in the dashboard; alternatively, you can follow a GitHub user like you follow people on Twitter.
6. GitHub pages: this is probably the coolest feature; you can use another branch (called gh-pages) in your GIT repository to build your website based on Jekyll. I made the website for knitr in this way, because I really want to make knitr a beautiful package, so everything has to be beautiful, then R documentation was ruled out. Of course, Hadley is a pioneer in documenting R packages with websites. In the future, I may want to develop a package based on knitr which turns R documentation into a website automatically (with examples parsed and evaluated, plots inserted), so you can host it on GitHub or somewhere else. This is only an idea at the moment, and feel free to contact me if you are interested.

I cannot say I'm already an efficient R programmer, but GitHub did make me much more efficient.