Why building R packages is good for you

July 23, 2010
By

(This article was first published on Timothée Poisot » R, and kindly contributed to R-bloggers)

Basically every function you use in R is part of a package (often the base or stats one). Most of the advances routines, such as the differential equations solvers in simecol are brought to R in the form of Fortran or C code. It is not, however, required to learn any other language that R to contribute a package to the community.

There is a great post at Dang, another error on how to create a R-only R packages (I believe they are a minority among the R packages). Here I share two reasons for which I think it is important to contribute even the simplest packages.

Developing R packages is a great idea if you need to share code among several peoples, or if you need to routinely apply the same functions over and over again. In my lab, we use a Fluostar Reader spectrophotometer to measure bacterial growth. While it is a good and reliable piece of hardware, the results are exported to Excel (which is bad if you want to do an analysis more complicated than, basically, opening the .xls file).

We use this spectrophotometer to produce a huge amount of data. I have myself accumulated over a thousand runs over the last months. Most of the time, I want to extract the growth rate of different bacteria, deal with the replicates, and format the results in a meaningful way. I am not the only one in the lab to do it, so I decided to give a try to this whole ‘package creating’ stuff. A couple days later, I came up with a package that allowed us to perform most of the analyses (i.e. growth rate, population size, and that’s nearly all we do anyway).

Developing the package allowed us to save a considerable amount of time. First, the analyses are simple to do, with the help of a step-by-step tutorial. Second, even with a basic understanding of R, students and technical staff of the lab are « on their own » to format the data and launch preliminary analyses. Third, we all speak the same language, and anybody can work on the code of anyone else, because we all use the same functions. And finally, the results are now reproducible, because we know precisely how they were obtained.

Reproducibility of the research is one of the most important thing to consider when deciding to provide a package to the community. In an (hopefully) upcoming paper on ecological measures, we developed new indicators and routines of trophic web generation, that were described in appendices. However, we also programmed a R package that we used for the analysis, and offered the referees to download it and try for themselves.

I think that these aspects (better team-work and easier reproducibility) are especially important to consider when working with R code. Sure, it is easier not to have the supplementary work to write the documentation, check the examples, and other associated tasks. But having things in a package also allows you to load your daily functions in one line, which is, on the long term, really time saving. Not to mention accurate version control and other associated goodnesses.

To leave a comment for the author, please follow the link and comment on his blog: Timothée Poisot » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: ,

Comments are closed.