Tips for R Package Creation

December 30, 2012
By

(This article was first published on TRinker's R Blog » R, and kindly contributed to R-bloggers)

I’m being tortured by the mistakes of my past self. I think I’ve made most every mistake possible in creating a package and I want to go back in time and tell year ago me all I know now. But it seems require(timetravel) isn’t working on my machine. So instead I’ll share with other new package creators what I’ve learned along the way in a sort of tips list (Letterman style). To give context, I am working on documenting a package (qdap), after it’s functions are finished (bad idea) and am lamenting all the mistakes as this was my first package attempt and its a major under taking.

Here are the things (riddled with helpful links) I wish I had known then that I know now:

  1. Start Small – It’s easier to learn to drive in a car than a dump truck.  I suggest making a small package even if it’s for fun to learn the process first (a game, music player  or fun visualization may be perfect for this).  This way you can refer back to this package often for “How did I do that?”   
  2. Use gitGitHub, bitbucket or some other git interface works awesome to upload a repository to a cloud (dropbox style interface) that you can back up your repo as well as share and collaborate with others.  (here’s a clip about github that’s slightly out of date but still good: LINK)  The issues tab is awesome for documenting bugs and requests.
  3. Use Rstudio – When I first started on qdap, as a windows user, the package creation process was painful.  Rstudio makes your life so much better.  Here’s a video example of how quick it is to create a package with Rstudio LINK 1 and a slighly out of date video of the interface between git and Rstudio LINK 2.
  4. Become familiar with “Writing R Extensions” manual - This is the rule book.  It’s like a club, if you don’t have the right look you aren’t getting in. Nuff said.
  5. Steal – github was designed to collaborate (aka stealing).  Find a trusted package developer and steal their format and design.  I personally steal from two places: Hadley Wickham’s github and Dason Kurkiewicz’s github.  All their files are there for easy sourcing.
  6. Document as you go – Trust me documenting over time is easier than documenting at the end.
  7. Document with roxygen2roxygen2 is a less painful way to write documentation (I recommend actually doing an .Rd file, aka a documentation file, by hand to feel the pain and appreciate roxygen2).  Here’s where stealing other people’s format is extremely useful; look at this Hadely .R file.  It’s nice when you’ve used roxygen2 to click roxygenize(path/to/repo) and the documentation is created.
  8. Use devtools – There are some great developmental tools in devtools (though many if not most/all are incorporated into Rstudio).
  9. Use testthat – I didn’t get why this was useful until I started trying to make changes to my package at the end.  Ever pull a thread on a sweater and it makes a big hole, that’s what a change in a package can do and testthat can help to make sure the changes don’t make a big hole.
  10. Learn to debug – I had no clue how cool  browser() was or how to use it when I started.  Here’s a nice video on R’s debugging tools: LINK.  Debugging stinks, debugging without tools really stinks.
  11. Reduce, Recycle, Reuse – Try to think “will I use this code chunk later?”  If the answer is yes break it off as function of its own and throw it in the package as an internal “helper” function.  This saves time and makes the code more readable.  Also try to make the code compact but as fast as possible.  benchmarking and Rcpp can make the code faster.
  12. Make friends/learning community – The folks at talkstats.com and stackoverflow.com have been a tremendous help in asking about the process and getting feedback.  I wouldn’t know about most of the above things if it were not for these two learning places.

Special thanks to Dason of talkstats.com for his patience in teaching and mentoring me through the package creation process.


To leave a comment for the author, please follow the link and comment on his blog: TRinker's R Blog » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.