Yesterday's New York Times includes a great article on the failure of some genetic tests for cancer detection, and the flaws in the research that led to them. The article features quotes from Keith Baggerly of MD Anderson Cancer Center, and includes a photo of him and colleague Kevin Coombes in front of a page of R code:
The article highlights the importance of reproducible research: unless others can have access to the data and code that backs up the statistical conclusions in the paper, many such errors are likely to continue to go undiscovered. I saw Keith give an amazing talk about reproducible research and data forensics at the BioConductor conference a couple of years ago (I'm pretty sure that's one of the slides from the talk in the image above):
The talk concerned a published article where the results seemed odd. Baggerly tried in vain to get hold of the source data to reproduce the analysis, but the article authors didn't cooperate. So in an amazing feat of data forensics, he managed to recreate the data by matching public sources to measurements from the printed graphs, and figured out that there were gross data errors in the article: labels transposed, data duplicated, that kind of thing. The conclusions were completely bunk, but the journal refused to print a correction, despite the fact that it meant actual patients were being trialled on inappropriate drugs.
I'm glad to see this important issue is getting some wider media attention — check out the Times article at the link below for the story.
New York Times: How Bright Promise in Cancer Testing Fell Apart