A Partial Remedy to the Reproducibility Problem

[This article was first published on Mad (Data) Scientist, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Several years ago, John Ionnidis jolted the scientific establishment with an article titled, “Why Most Published Research Findings Are False.” He had concerns about inattention to statistical power, multiple inference issues and so on. Most people had already been aware of all this, of course, but that conversation opened the floodgates, and many more issues were brought up, such as hidden lab-to-lab variability. In addition, there is the occasional revelation of outright fraud.

Many consider the field to be at a crisis point.

In the 2014 JSM, Phil Stark organized a last-minute session on the issue, including Marcia McNutt, former editor of Science and Yoav Benjamini of multiple inference methodology fame. The session attracted a standing-room-only crowd.

In this post, Reed Davis and I are releasing the prototype of an R package that we are writing, revisit, with the goal of partially remedying the statistical and data wrangling aspects of this problem. It is assumed that the authors of a study have supplied (possibly via carrots or sticks) not only the data but also the complete code for their analyses, from data cleaning up through formal statistical analysis.

There are two main aspects:

  • The package allows the user to “replay” the authors’ analysis, and most importantly, explore other alternate analyses that the authors may have overlooked. The various alternate analyses may be saved for sharing.
  • Warn of statistical   errors, such as: overreliance on p-values; need for multiple inference procedures; possible distortion due to outliers; etc.

The term user here could refer to several different situations:

  • The various authors of a study, collaborating and trying different analyses during the course of the study.
  • Reviewers of a paper submitted for publication on the results of the study.
  • Fellow scientists who wish to delve further into the study after it is published.

The package has text and GUI versions. The latter is currently implemented as an RStudio add-in.

The package is on my GitHub site, and has a fairly extensive README file introducing the goals and usage.


To leave a comment for the author, please follow the link and comment on their blog: Mad (Data) Scientist.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)