R is indispensable, because it’s reproducible

August 31, 2010

Maria Wolters, self-styled "Science-Mum of two" and speech and language technology researcher, has a great blog post about the one tool she couldn't live without: R. Maria says R is her "favourite tool for analysing experimental results and modelling the resulting patterns of behaviour and preferences", and explains why:

R is a programming language for everything statistical. It’s free, it’s open source, and it’s being maintained by statisticians for statisticians. Its origin means that it is a pain to learn. It takes a while until one has cleared a path through the data structures, including the various conventions for extracting information from objects that store the results of painstaking statistical analyses, and I am still often baffled myself.

But the payoff is magnificent. Clear (modulo coding ability), open, replicable analyses. R is the ultimate in replicable research. If you give people your data set and your source code, they can repeat every single step of your reasoning. There are no paywalls, no limits of affordability, no packages that are indispensable for the analysis, but that your department hasn’t paid for.

This issue of "replicable analysis" is an important one: the ability to know that you can re-run your analysis at any time in the future (assuming you still have access to the same hardware, or at least a virtual instance of it) and verify the results, without having to worry about the software no longer being available, is crucial. It also means that third parties can reproduce your results where necessary. The fact that it really is necessary to support good science is the topic that Fritz Leisch covered in this excellent keynote speech at this year's UseR! conference.

Speech and Science: The one tool I couldn't live without

