My Approach To Reproducible Research

June 22, 2015
By

(This article was first published on manio, and kindly contributed to R-bloggers)

The goal is simple. During my research, I often need to run a lot of different workloads, plot the results and write some analysis text. My goal is to:

  • find out parameters I used to generate the results. So I can answer questions “why I see an outlier in my figure?”
  • find out the script that I used to generate the plot. So I can improve the figure for publication.
  • rerun the whole program and get the results. So I can produce new results by changing the old one.

To do so, I use:

  • Github
  • R Markdown (RStudio)
  • Makefile.py
  • get_github_url.py
  • analyzer.r
  • download_github_private_file.py
  • source_private_github_file.r

https://github.com/junhe/reproducible-research-template

Here are some guidelines for myself.

Manage all code by one github repository

Centralized management is easier. Using Git, you have all access to the history of all your code.

Never write commands in command line

If you write ./my-awesome-code parameter1 parameter2, you will never find out what parameter1 and parameter2 were after two months.

Put ALL scripts to Makefile.py

If you put your parameters and everything in a file like Makefile.py, you will be able to find out what you did in what day. You don’t need to remember parameter except to run ./Makefile.py. Don’t use ./Makefile.py’s arguments, for the same reason.

Use get_github_url.py to get plotting script

Currently, get_github_url.py snapshots the current code and put the following script to copyboard of Mac OS.

# this requires curl installed in your OS
library(devtools)
source_url("https://gist.github.com/junhe/1f7e41f4c2829486e46f/raw/source_private_github_file.r")
source_private_github_file("doraemon", "analysis/analyzer.r", "599060f45d97538b9dffda4b54ab88d1e7eff006")

If you copy and paste the code above to R Mardown, it will source analysis/analyzer.r in project “doraemon”, which contains the ploting script.

Use organized script analyzer.r to plot

This template makes it easier to have reusable plotting code.

Use R Markdown to integrate plots (as code chunk) and analysis text

This is literate programming. Code and analysis are together. This is the ultimate output of the project, where you can find insights.

Put R Markdown files to Github repository

The Github repository, which will never be lost, will be the central place where you will find everything you need to reproduce the results months or years later.

To leave a comment for the author, please follow the link and comment on their blog: manio.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)