I’m happy to say that the
represtools package is now on CRAN. I’m so giddy that I’m writing this post not that long after it was accepted. This means that binaries haven’t been built yet and it may not have been propagated to all of the various mirrors. Mostly I’m just happy that it made it through without any notes or angry e-mails from professors in Oxford.
So what does it do? It supports reproducible research. Heavily influenced by the writings of Christopher Gandrud, whose amazing book (now in its second edition!) may be purchased here. Starting from his wise teachings, over the past year or so, my workflow has morphed into something which relies on the following principles:
- Any and all of the steps in the research may be reproduced by another researcher. (This “other” researcher is usually me, months later.)
- Each analytic step is modular. This means that it may work in isolation, so long as precedent data exists.
- There are four major analytic steps: gather, cook, analyze and publish.
- Every step uses
rmarkdownfor code and documentation.
- Data is exchanged using .rda files.
makeprogram is used to ensure that data and content is (re)built when needed by any module.
The first thing that
represtools does is build the directory structure to support this. Additionally, it provides a makefile which will reconstruct any analysis component which needs it. Finally, it (optionally) creates an RStudio project file which uses “make” as the target of the build process. There are functions which will add R Markdown templates in the appropriate directories. By default, each analysis component loads data from the prior step, does work and then saves data for the next step.
Gather -> Cook -> Analyze -> Present
I presented a version of this approach earlier this year at the Analytics>Forward unconference here in the Research Triangle Park. This package more or less formalizes everything that I presented there.
You can read more here: http://pirategrunt.com/represtools/. At present, it’s more or less the vignette that I wrote, pushed to Github and forgot to include in the source build. [sigh]
The project is on Github and I’m more than happy to hear comments, complaints and whatever. I get that this is a workflow that’s very specific to the way I work, but it feels fairly universal. Anything to push it in that direction would be awesome.
Coming soon on the beta (i.e. Github) is an RStudio add-in with a visual interface. I had gotten a start on this, but felt it wasn’t really essential for launch. In the meantime, though, you’ll get the
Present templates in the new R markdown dialog box. That’s pretty cool, right?