As an ecologist working on climate change questions, I’ve always found it rather tedious to acquire and process climate data, especially when dealing with large spatiotemporal scales. Although many agencies provide free access to climate data, there is often some overhead (typically one to two days) before the data are made available for download via ftp. Next, one has to process such data to match the structure of the biological information. Some of these data are provided in one of many binary formats which requires additional processing. While individual scientists and labs have workflows to complete such disparate steps, they are rarely included as part of a publication thereby leaving out critical data provenance. Even when peer-reviewed articles include one-off scripts (and associated data), missing provenance information makes it difficult to reproduce the results [cite]10.1038/nm1107-1276b[/cite]. Workflow repositories are needed to address the larger issue. In the meantime, one way to address the problem would be to encapsulate the above mentioned steps (data acquisition, format conversion and interpolation) as part of the code that are already included in supplementary materials.
On that note, I’m pretty excited by the announcement of a new R package called RNCEP in the current issue of Methods in Ecology and Evolution [cite]10.1111/j.2041-210X.2011.00138.x[/cite]. The package provides an interface to atmospheric data from National centers for environmental prediction and NCEP/DOE. By encapsulating all the steps from data acquisition and format conversion to interpolation and aggregation from within R, the package provides a way to document an entire workflow as part of an article supplement. As more data repositories open up APIs, similar packages will go a long way towards promoting more open science.