Reproducible research is a topic that people like to talk about these days.
Thinking about reproducible research and learning the important
tools is what improved my work more than anything.
Not in a sense that my results got better. More
in a sense that my feeling about the work got better and my analyses got
easier to understand for future me.
So in the following I would like to give a list of things that are helping
future Heidi most:
- Literate Programming (Sweave, knitr, RMarkdown, Roxygen)
- Make (Makefiles)
- Version control (SVN, Git)
This post is not meant as an instruction to how to use the tools.
Introductions are linked above. I want to tell how I am connecting the tools
How to use Literate Programming, Version Control and Makefiles together
As a statistician, before shipping a report, talk or paper usually many hours are
spent with the cleaning and exploration of the data. In order to keep track of what
I already did, I like to produce a PDF of all tables, figures and other analyses
that I produce. Or at least those that are in any way usefull. That I do using
knitr. With knitr I can also include my thoughts
I document (almost) all functions that I write for a project with Roxygen2. It is
a fast and easy way to keep track of what your function does and what parameters
are supposed to mean. And if you decide to make an R-package out of the code, it
can easily be transformed into .Rd files.
To keep track of the changes and in order to be able to go back to old
versions of my code or my latex files, I use SVN (or occasionally git). This is
super nice when you work alone. It is even better when you work with other people.
With SVN I control and save all my R-, Rnw-, tex- and other files that are
important to keep in case my computer breaks or my office burns down. In this
way it is also possible for me to work on different machines without any hassle.
I just checkout the repository to the given machine.
Imagine you write a paper or a report. You change something in the data cleaning
code or in your analyses. Wouldn’t it be awesome to just do a single command and
all codes that depend on your change are run and your paper/report automatically
updated? It is possible by defining a simple text file (Makefile) and then just
make all. At least when you work in Linux.
The gold standard
My personal goal is
- To make my work so understandable that in some years I will
still understand what I did or even that I could quit a project and someone
else could keep working on it without major problems.
- To be able to go back to old thoughts and track changes that anyone did
on the project.
- To have a “flow” in my projects where I can change things somewhere in the
middle of my code and with a single command (make) I can update everything that
- If the worst happens and all the machines I work on brake, I want to be able
to recreate everything without loosing much time.