All Your Source Code Are Belong to… Nature?

February 28, 2012

(This article was first published on Data, Evidence, and Policy - Jared Knowles, and kindly contributed to R-bloggers)

The Journal of Nature put out an interesting op-ed recently discussing the need to make source code available for scientific articles that require statistical computation to produce their results.

The article is hits on a point that is absolutely critical–statistical computing is difficult. Honest mistakes get made. A lot. The peer review process catches theoretical flaws, omitted bibliographic references, and some criticism of the methods based on the amount of detail provided in the article itself. But, all of those flaws could be absent and an article could still be fatally flawed and draw completely false conclusions, simply due to an error in the code, and it would still be published if that code was never reviewed or made public.

A big concern here is transparency, as the authors state so well:

Our view is that we have reached the point that, with some exceptions, anything less than release of actual source code is an indefensible approach for any scientific results that depend on computation, because not releasing such code raises needless, and needlessly confusing, roadblocks to reproducibility.

And of course, R and Sweave are mentioned as an elegant solution to this problem:

There are a number of tools that enable code, data and the text of the article that depends on them to be packaged up. Two examples here are Sweave associated with the programming language R and the text-processing systems LaTeX and LyX, and GenePattern-Word RRS, a system specific to genomic research31.Sweave allows text documents, figures, experimental data and computer programs to be combined in such a way that, for example, a change in a data file will result in the regeneration of all the research outputs.

Technology has changed the tools necessary to ensure rigor and replicability in science, but not the principle behind it. It is great to see a journal such as Nature making the case for this level of scrutiny to be applied to the computational routines used to derive results. 

To leave a comment for the author, please follow the link and comment on their blog: Data, Evidence, and Policy - Jared Knowles. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)