(a) installing Git using the Eclipse plugin Egit. (b) uploading repositories to GitHub, and (c) links to resources on Git, Git and LaTeX, and Git and R. The focus is on version control for people working on R, Sweave, and LaTeX related projects.
Version control works really well with R, Sweave, and LaTeX projects.
Benefits of Version Control
There are many benefits to version control for the data analyst. Version control allows you to:
- Rewind a project or a file to a previous state, which in turn encourages experimentation
- Ensure there is a record of changes
- Facilitate collaboration
- Facilitate backup
- Show changes between files
- Facilitate code sharing and reproducibility
- and much more... See this question on StackOverflowfor further discussion.
I also found that adopting version control facilitated several conceptual benefits. It encouraged greater consideration of:
- the distinction between source and derived files
- the nature of dependencies:
- dependencies between elements of code
- dependencies between files within a project
- and dependencies with files and programs external to the repository
- the nature of a repository and how repositories should be divided
- the nature of committing and documenting changes and project milestones
Choosing a Version Control System and Workflow
There are many version control systems (see Plastic SCM for a discussion). I've chosen to use Git for the following reasons:
- Git can work well with Eclipse and Windows using Egit and many other tools
- Git is one of the popular version control systems
- Git enables uploading to Github
- Git has good documentation and support material
- Experts, who know a lot more about version control than I do, use Git (e.g., Hadley Wickham); the designer of Git is Linus Torvalds.
Finally, the big difference is between using a version control system and not using a version control system.
EGit is a Git plugin for Eclipse. I use Eclipse and StatET to write R code and Sweave documents. I found EGit a particularly easy tool for getting started with Git and version control. The documentation is straightforward and the interface is easily integrated into my Eclipse workflow.
Getting Started with EGit and Git in Eclipse
There are many ways to interact with Git.
Installing EGit in eclipse involves using the update manager. Vogella.de has a tutorial.
To get started with your first Git repository in Eclipse, check out the EGit user Guide. When I was first getting started I used a simple R project rather than a Java Hello World application.
GitHub is one of several sites for sharing git repositories (for example, see Hadley Wickham's baby names analysis, or my own example of using Sweave to write Multiple Choice Questions). It also has many useful social networking features.
Uploading a repository to GitHub from Eclipse
Set up a free account on https://github.com/
Work through the tutorial on creating a repository at GitHub
While the above tutorial briefly mentions SSH Configuration, it does not go into detail. When setting up my SSH key, I did the following:
Eclipse -- Window -- Preferences -- General -- Network Connections -- SSH2 -- Key Management
- Click on Generate DSA Key
- Type in a passphrase (i.e., a long and robust password)
Save Private Key(I saved it to a new folder under my user account)
- Go to this new folder and open "id_dsa.pub" as a plain text file and copy the contents of the file to the clipboard.
- Go to
github.com -- Account Settings -- SSH Public Keysand click
Add another public key.
- Paste the public key into the box and give it a name
Gists provide a quick way to get started with GitHub. Gists are useful for storing and sharing snippets of code. The result can be embedded into blog posts. To get formatted R code, give the file name a ".r" file extension (e.g., "test.r") (thanks to Hadley Wikham)
A simple example of an embedded gist is shown below:
Interesting R GitHub repositories
Good examples of people sharing R projects on GitHub include:
Git and LaTeX
- Discussion on the Academic Productivity Blog on the topic of version control and LaTeX
- Tex.SE has several questions on LaTeX and git
- Article on tools for collaborative writing of scientific latex documents
- vc package on CTAN supports Git
Git and R
- Benefits of version control for the solo data analysist
- Revision control and statistics
- Thoughts on version control by Kieran Healy
- Revision Control, Workflow, and R