Why building R packages is good for you

[This article was first published on Timothée Poisot » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Basically every function you use in R is part of a package (often the base or stats one). Most of the advances routines, such as the differential equations solvers in simecol are brought to R in the form of Fortran or C code. It is not, however, required to learn any other language that R to contribute a package to the community.

There is a great post at Dang, another error on how to create a R-only R packages (I believe they are a minority among the R packages). Here I share two reasons for which I think it is important to contribute even the simplest packages.

Developing R packages is a great idea if you need to share code among several peoples, or if you need to routinely apply the same functions over and over again. In my lab, we use a Fluostar Reader spectrophotometer to measure bacterial growth. While it is a good and reliable piece of hardware, the results are exported to Excel (which is bad if you want to do an analysis more complicated than, basically, opening the .xls file).

We use this spectrophotometer to produce a huge amount of data. I have myself accumulated over a thousand runs over the last months. Most of the time, I want to extract the growth rate of different bacteria, deal with the replicates, and format the results in a meaningful way. I am not the only one in the lab to do it, so I decided to give a try to this whole ‘package creating’ stuff. A couple days later, I came up with a package that allowed us to perform most of the analyses (i.e. growth rate, population size, and that’s nearly all we do anyway).

Developing the package allowed us to save a considerable amount of time. First, the analyses are simple to do, with the help of a step-by-step tutorial. Second, even with a basic understanding of R, students and technical staff of the lab are « on their own » to format the data and launch preliminary analyses. Third, we all speak the same language, and anybody can work on the code of anyone else, because we all use the same functions. And finally, the results are now reproducible, because we know precisely how they were obtained.

Reproducibility of the research is one of the most important thing to consider when deciding to provide a package to the community. In an (hopefully) upcoming paper on ecological measures, we developed new indicators and routines of trophic web generation, that were described in appendices. However, we also programmed a R package that we used for the analysis, and offered the referees to download it and try for themselves.

I think that these aspects (better team-work and easier reproducibility) are especially important to consider when working with R code. Sure, it is easier not to have the supplementary work to write the documentation, check the examples, and other associated tasks. But having things in a package also allows you to load your daily functions in one line, which is, on the long term, really time saving. Not to mention accurate version control and other associated goodnesses.

To leave a comment for the author, please follow the link and comment on their blog: Timothée Poisot » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)