In preparation for a R Workgroup meeting, I started thinking about what would be my “Top 5 R Functions”. I ruled out the functions for basic mechanics – save, load, mean, etc. – they’re obviously critical, but every programming language has them, so there’s nothing especially “R” about them. I also ruled out the fancy statistical analysis functions like (g)lmer — most people (including me) start using R because they want to run those analyses so it seemed a little redundant. I started using R because I wanted to do growth curve analysis, so it seems like a weak endorsement to say that I like R because it can do growth curve analysis. No, I like R because it makes (many) somewhat complex data operations really, really easy. Understanding how take advantage of these R functions is what transformed my view of R from purely functional (I need to do analysis X and R has functions for doing analysis X) to an all-purpose tool that allows me to do data processing, management, analysis, and visualization extremely quickly and easily. So, here are the 5 functions that did that for me:
- subset() for making subsets of data (natch)
- merge() for combining data sets in a smart and easy way
- melt() for converting from wide to long data formats
- dcast() for converting from long to wide data formats, and for making summary tables
- ddply() for doing split-apply-combine operations, which covers a huge swath of the most tricky data operations
For anyone interested, I posted my R Workgroup notes on how to use these functions on RPubs. Side note: after a little configuration, I found it super easy to write these using knitr, “knit” them into a webpage, and post that page on RPubs.
Conspicuously missing from the above list is ggplot, which I think deserves a special lifetime achievement award for how it has transformed how I think about data exploration and data visualization. I’m planning that for the next R Workgroup meeting.