My favorite tools for helping future me

(This article was first published on Heidi's stats blog - Rbloggers, and kindly contributed to R-bloggers)

Reproducible research is a topic that people like to talk about these days.
Thinking about reproducible research and learning the important
tools is what improved my work more than anything.
Not in a sense that my results got better. More
in a sense that my feeling about the work got better and my analyses got
easier to understand for future me.

before
after

So in the following I would like to give a list of things that are helping
future Heidi most:

Of course there are many other things for me that are influencing the
usefullness of these tools. First of all I use R
for statistical analyses and run
Linux (Ubuntu) on my
computer (and servers).

This post is not meant as an instruction to how to use the tools.
Introductions are linked above. I want to tell how I am connecting the tools
for projects.

How to use Literate Programming, Version Control and Makefiles together

knitr

As a statistician, before shipping a report, talk or paper usually many hours are
spent with the cleaning and exploration of the data. In order to keep track of what
I already did, I like to produce a PDF of all tables, figures and other analyses
that I produce. Or at least those that are in any way usefull. That I do using
knitr. With knitr I can also include my thoughts

Roxygen

I document (almost) all functions that I write for a project with Roxygen2. It is
a fast and easy way to keep track of what your function does and what parameters
are supposed to mean. And if you decide to make an R-package out of the code, it
can easily be transformed into .Rd files.

SVN

To keep track of the changes and in order to be able to go back to old
versions of my code or my latex files, I use SVN (or occasionally git). This is
super nice when you work alone. It is even better when you work with other people.
With SVN I control and save all my R-, Rnw-, tex- and other files that are
important to keep in case my computer breaks or my office burns down. In this
way it is also possible for me to work on different machines without any hassle.
I just checkout the repository to the given machine.

Make

Imagine you write a paper or a report. You change something in the data cleaning
code or in your analyses. Wouldn’t it be awesome to just do a single command and
all codes that depend on your change are run and your paper/report automatically
updated? It is possible by defining a simple text file (Makefile) and then just
say make all. At least when you work in Linux.

The gold standard

My personal goal is

  1. To make my work so understandable that in some years I will
    still understand what I did or even that I could quit a project and someone
    else could keep working on it without major problems.
  2. To be able to go back to old thoughts and track changes that anyone did
    on the project.
  3. To have a “flow” in my projects where I can change things somewhere in the
    middle of my code and with a single command (make) I can update everything that
    is neccesary.
  4. If the worst happens and all the machines I work on brake, I want to be able
    to recreate everything without loosing much time.

To leave a comment for the author, please follow the link and comment on their blog: Heidi's stats blog - Rbloggers.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)