Being one of two R experts at my current job I figured I should be familiar with package development. Frankly, I’ve been procrastinating on this topic since I started using R in 2007 – I was doing just fine with
source() and the section of the R manual on package development fell into the category of TL;DR.
What finally drove me to learn the esoteric details of R packaging were the following two events at work:
A coworker sent out an email announcing a new R analysis script which included a few algorithms I wrote and passed on. It was accompanied by documentation in a README.txt file and employed console menus and user dialogs to ease use. Otherwise, details of what the code was doing were left to code comments.
I read an email train between coworkers after being out of the office and disconnected for a day asking about the correct set of parameters to use in a function I wrote. Fortunately, they figured it out thanks to my excessive commenting habits.
Lesson learned: I no longer write R code just for myself and using code comments as documentation just wasn’t cutting it anymore. I needed an efficient way to:
- distribute code
- provide documentation that uses the in-built help system
Unlike Matlab or Python, R does not have an effective way to provide simple documentation for code – functions, objects, etc. There is this post on StackOverflow, but I expect that such functionality should be built into the environment, not hacked on.
Long winded introduction over, I finally dove in. Thankfully, my entry wasn’t too rough. A couple months ago I read a couple posts submitted to R-bloggers regarding “easy” package development using
devtools, and RStudio (my R IDE of choice).
My problem to solve: get parameters from biological growth curves
My package: groan
Title: Utilities for biological GROwth curve ANalysis
Author: W. Lee Pang
Description: groan is a set of tools to assist in the analysis of biological
growth curves. It provides functions to smooth input data and extract key
parameters such as specific growth rate (mu), carrying capacity (A), and lag
Working with microorganisms, a common task is determining a culture’s specific growth rate – e.g. how many times the population will double in an hour. While not a hard task, it can be tedious if numerous cultures are involved, or if the underlying data is noisy.
groan is essentially the R package I wish I had as a grad student and postdoc, but was too occupied otherwise to write.
Yes, the name
groan is a pun:
- “grown” : as in yay the cells grew
- “groan” : as in ugh, I have to process yet another growth curve
Humor aside, it reduces a CSV of multiple growth curve data points into a table of growth parameters and a summary plot in under 10 lines of code.
From this …
Y = read.csv('path/to/your/data.csv', stringsAsFactors=F)
Y = groan.init(Y)
Y.s = groan.smooth(Y, adaptive=T, method='loess')
U = groan.mu(Y.s)
U.s = groan.smooth(U, adaptive=T, method='loess')
U.f = groan.fit(U.s, method='pulse')
stats = data.frame(mumax = max(U.f),
t.lag = groan.tlag(U.f),
gen = groan.generations(U.f))
plot(Y) # plot thumbnail grid of raw growth curves
For more information, examples, or to test out the code yourself head to the groan repository on Github.