Tools for Hacking R: Git + Subversion

August 24, 2010

(This article was first published on BioStatMatt » R, and kindly contributed to R-bloggers)

In an earlier post, I discussed how to use Subversion to download, edit, and generate a patch against R‘s source code. Since most of us can’t commit our code changes back to R‘s repository, we can consider alternatives to store and maintain our patch, until it is eventually incorporated into R. Of course, our changes may never be incorporated. We still ought to have a record of our work!

The biggest problem in maintaining a patch, is ensuring compatibility with upstream changes. In other words, once we’ve written a patch, we need to ensure that subsequent changes to R‘s main development branch don’t conflict with our changes. The Git version control software can help us here.

Git is similar in purpose to Subversion; it’s used to track changes to source code. Git has features that make it easy to maintain a patch against a larger project. In contrast with Subversion, a complete Git repository is designed to be stored locally. In addition, Git is often distributed with tools that make it easy to interact with Subversion repositories.

This is a blog post, so lets just see an example: We first need to install the Git and Git-Subversion packages. In Debian GNU Linux or Ubuntu, we can use aptitude:

$ aptitude install git-core git-svn

We can then use git svn to download and initialize a Git repository from the R Subversion repository:

$ git svn clone -r52760 R-patch

This command tells Git to download the the Subversion repository at, at revision 52760, and use it to initialize a Git repository locally in directory R-patch. The -r argument here is critical. If the revision is not provided, the entire revision history is downloaded from the Subversion repository (all ~53k revisions)! It’s also important to select a revision that is current, because when the Git repository is updated, all subsequent revisions are downloaded.

Now we have a local Git repository in R-patch, we can modify this code and keep track of our changes under the normal Git conventions. Say we want to increase the number of available R connections. We can modify src/main/connections.c such that the resulting diff is:

$ git diff
diff --git a/src/main/connections.c b/src/main/connections.c
index ee01a9d..7fa73b9 100644
--- a/src/main/connections.c
+++ b/src/main/connections.c
@@ -60,7 +60,7 @@ typedef long long int _lli_t;
   extern UImode  CharacterMode;

-#define NCONNECTIONS 128 /* snow needs one per slave node */
+#define NCONNECTIONS 256 /* snow needs one per slave node */
 #define NSINKS 21

 static Rconnection Connections[NCONNECTIONS];

and commit our changes locally with something like:

$ git commit -a -m"increase available connections"
[master d8e4b62] increase available connections
 1 files changed, 1 insertions(+), 1 deletions(-)

Now that we have a patch against the revision 52760, we need to ensure that subsequent changes in the Subversion trunk don’t conflict with our code. The Git-Subversion software has a special command to deal with this, called rebase. The rebase command ‘unwinds’ our local work, applies the changes from the Subversion trunk, and then ‘replays’ our work on top of those changes. If there are conflicts, Git-Subversion will issue a notification and mark the areas in each file where a conflict occurs. At this point the rebase operation is incomplete, and you must manually resolve the conflicting code. When all conflicts are resolved, the rebase --continue command completes the rebase operation, and our patch maintenance is complete.

To illustrate:

$ git svn rebase
	M	src/main/deparse.c
r52761 = 9d0f32ca4cd8067f1ec5407b40af5c0a21cee5b4 (refs/remotes/git-svn)
	M	src/library/base/man/strptime.Rd
	M	src/main/datetime.c
	M	doc/NEWS.Rd

<snipped for blog post>

r52795 = b7c88c3bc39bf679ed8609111a3390b218823120 (refs/remotes/git-svn)
	M	doc/NEWS.Rd
r52796 = be0b53290415a43d0aa0fab2245553ce2d9e455f (refs/remotes/git-svn)
First, rewinding head to replay your work on top of it...
Applying: increase available connections
Using index info to reconstruct a base tree...
Falling back to patching base and 3-way merge...

Clearly, our local modifications did not result in a conflict, and so we have successfully maintained this trivial patch. In addition, our local commit is now at the top of the Git commit log, just after the latest Subversion commit by Peter Dalgaard, of the R core team:

$ git log
commit 295b642df92af768c3cd0813d6b3593a00061617
Author: Matt Shotwell <[email protected]>
Date:   Mon Aug 23 21:34:34 2010 -0400

    increase available connections

commit be0b53290415a43d0aa0fab2245553ce2d9e455f
Author: pd <[email protected]>
Date:   Mon Aug 23 21:13:50 2010 +0000


    git-svn-id: http:[email protected] 00db46b3-68df-0310-9c12-caf00c1e9a41

<snipped for blog post>

We can generate a new patch file against the latest (Subversion trunk) revision using git diff, and specifying only the revision(s) we had committed locally:

$ git diff be0b532..
diff --git a/src/main/connections.c b/src/main/connections.c
index a06d01d..7402552 100644
--- a/src/main/connections.c
+++ b/src/main/connections.c
@@ -60,7 +60,7 @@ typedef long long int _lli_t;
   extern UImode  CharacterMode;

-#define NCONNECTIONS 128 /* snow needs one per slave node */
+#define NCONNECTIONS 256 /* snow needs one per slave node */
 #define NSINKS 21

 static Rconnection Connections[NCONNECTIONS];

where be0b532 is the (partial) Git hash code of the latest Subversion trunk revision, and be0b532.. selects the commits since this revision, i.e our local changes.

To leave a comment for the author, please follow the link and comment on their blog: BioStatMatt » R. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , ,

Comments are closed.


Mango solutions

RStudio homepage

Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training


CRC R books series

Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)