Reproducible research is a topic that people like to talk about these days. Thinking about reproducible research and learning the important tools is what improved my work more than anything. Not in a sense that my results got better. More in a sense that my feeling about the work got better and my analyses got easier to understand for future me.
So in the following I would like to give a list of things that are helping future Heidi most:
- Literate Programming (Sweave, knitr, RMarkdown, Roxygen)
- Make (Makefiles)
- Version control (SVN, Git)
This post is not meant as an instruction to how to use the tools. Introductions are linked above. I want to tell how I am connecting the tools for projects.
How to use Literate Programming, Version Control and Makefiles together
As a statistician, before shipping a report, talk or paper usually many hours are spent with the cleaning and exploration of the data. In order to keep track of what I already did, I like to produce a PDF of all tables, figures and other analyses that I produce. Or at least those that are in any way usefull. That I do using knitr. With knitr I can also include my thoughts
I document (almost) all functions that I write for a project with Roxygen2. It is a fast and easy way to keep track of what your function does and what parameters are supposed to mean. And if you decide to make an R-package out of the code, it can easily be transformed into .Rd files.
To keep track of the changes and in order to be able to go back to old versions of my code or my latex files, I use SVN (or occasionally git). This is super nice when you work alone. It is even better when you work with other people. With SVN I control and save all my R-, Rnw-, tex- and other files that are important to keep in case my computer breaks or my office burns down. In this way it is also possible for me to work on different machines without any hassle. I just checkout the repository to the given machine.
Imagine you write a paper or a report. You change something in the data cleaning
code or in your analyses. Wouldn’t it be awesome to just do a single command and
all codes that depend on your change are run and your paper/report automatically
updated? It is possible by defining a simple text file (Makefile) and then just
make all. At least when you work in Linux.
The gold standard
My personal goal is
- To make my work so understandable that in some years I will still understand what I did or even that I could quit a project and someone else could keep working on it without major problems.
- To be able to go back to old thoughts and track changes that anyone did on the project.
- To have a “flow” in my projects where I can change things somewhere in the middle of my code and with a single command (make) I can update everything that is neccesary.
- If the worst happens and all the machines I work on brake, I want to be able to recreate everything without loosing much time.