rOpenSci is a collaborative effort to develop R-based tools for facilitating Open Science.
So what is Open Science?
Open science is the practice of making various elements of scientific research — data & methods, code & software, and results & publications — readily accessible to anyone. While this has great potential for advancing research (in addition to education, public policy, & commercial innovation) as a whole, there are both technical and social challenges preventing this practice from being more widespread. Social challenges stem largely from the dichotomy between what is best for an individual researcher and what is best for the community. Technical challenges arise largely from issues of scale: putting free print copies of DNA sequencing data in a box in front of your office doesn’t scale as well as depositing those sequences on repositories like GeneBANK.
Why does open science need tools?
Our goal is to provide open-source tools to help address both these challenges. These are interesting times. The technology to facilitate the access and utilization of this data has never been better, yet it is only beginning to be employed. The internet — firmly in its second generation, the read-write Web 2.0 culture in which users generate content as readily as they consume it — has led to the explosion of mechanisms for sharing. Yet these tools are not widely leveraged in scientific communities .
Why R for open science?
R is an open-source statistical environment that can be used for not only statistics, but also for data acquisition, data manipulation, modeling, among other uses. R is increasingly being used by scientists across all disciplines and has overtaken popular scientific programming tools. Part of the reason behind R’s explosive growth is the ease with which the ever-growing userbase can add new functionality, a fact evidenced by 3,000+ currently available R packages. The R framework is ideal for open science because:
- The software is free.
- There is an extensive user community from which help is very quickly given, and
- The scientific workflow can be documented for fully reproducible research.
Open Access Literature
Published literature is still the most common repository of scientific information. While literature continues to expand exponentially, the amount any individual researcher can consume appears to be constant. Whether discovering what to read, identifying research trends, or summarizing existing work, scalable solutions require computational approaches. Mendeley, a citation tool, literature repository, and social network which has just cataloged its 100 millionth paper, has released a public API through which it challenges the scientific community to leverage this data to facilitate novel applications. PLoS, The Public Library of Science, has also joined the start-up with there own public API allowing access to the metadata and full text of all its publication through an accessible RESTful interface.
Access to scientific data is rapidly emerging as a central theme in the future of research . In our field of Ecology and Evolution, new requirements for data management plans from NSF, data deposition requirements of prominent journals, and the emergence of well-developed repositories like GeneBANK, Dryad, TreeBASE, DataONE, (KNB, NBII, GBIF) have opened a wealth of potential .
Building Bridges between Open Access Literature and Open Data
Despite these dramatic shifts, the bulk of scientific research has been slow to benefit from the transformation than they are to comply with the new requirements. To address these challenges, we have set about building bridges between the repositories and the open source R statistical environment. We are creating packages that allow access to these data repositories through a statistical programming environment that is already a familiar part of the workflow of many scientists. We hope that these bridges will not only facilitate drawing data into an environment where it can readily be manipulated, but also one in which those analyses and methods can be easily shared, replicated, and extended by other researchers.
Keep up with updates on the rOpenSci project here and on twitter.
- K.R. Coombes, J. Wang, and K.A. Baggerly, “Microarrays: retracing steps”, Nature Medicine, vol. 13, 2007, pp. 1276-1277. DOI.
- . , “Challenges and Opportunities”, Science, vol. 331, 2011, pp. 692-693. DOI.
- O.J. Reichman, M.B. Jones, and M.P. Schildhauer, “Challenges and Opportunities of Open Data in Ecology”, Science, vol. 331, 2011, pp. 703-705. DOI.