One step ahead in Bioinformatics using Package Repositories

[This article was first published on BioCode's Notes, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

About a year ago I published a post about in-house tools in research and how using this type of software may end up undermining the quality of a manuscript and the reproducibility of its results.  While I can certainly relate to someone reluctant to release nasty code (i.e. not commented, not well-tested, not documented), I still think we must provide (as supporting information) all “in-house” tools that have been used to reach a result we intend to publish. This applies especially to manuscripts dealing with software packages, tools, etc. I am willing to cut some slack to journals such as Analytical Chemistry or Molecular Cell Proteomics, whose editorial staffs are –and rightly so- more concerned about quality issues involving raw data and experimental reproducibility, but in instances like Bioinformatics, BMC Bioinformatics, several members of the Nature family and others at the forefront of bioinformatics, methinks we should hold them to a higher standard. Some of these journals would greatly benefit from implementing a review system from the point of view of Software Production, moving bioinformatics and science in general one step forward in terms of reproducibility and software reusability. What do you think would happen if the following were checked during peer reviewing?

  • Quality of the documentation, in terms of examples, use cases and in-code comments (functions, classes)
  • Availability of a complete set of unit tests (Most programming languages contain packages providing a complete environment for testing all software components (classes, functions) of the tools and libraries they are used to develop).
  • Reusability
Manuscripts should be sent for review not only to biologists or bioinformaticians with a background in biological sciences or chemistry, but also to researchers with strong, solid skills in the field of software production, who would then be able to perform a detailed analysis of the code, documentation quality and unit tests.

Another suggestion: tools and software should be available through package repositories (Maven, CPAN, CRAN, PyPI), a good example of which is the Journal of Statistical SoftwareCRAN engage. Such repositories would provide for easier finding, installing and testing of different packages and their dependencies, furthering the advance of science and research. 


Number of packages per Repository in the last three years.

The good news is that there already are software package repositories for most of the existing programming languages, and they keep growing (Figure 1):
Of course I am not holding my breath hoping that these suggestions reach the ears of a friendly science journal editor. But there’s one thing we (bioinformaticians and developers reading this post) can all do now, which is to up the ante in our tool development practices. What do you think?

To leave a comment for the author, please follow the link and comment on their blog: BioCode's Notes.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)