Hadley Wickham is hard at work, releasing packages which leverage the expressive power of R to keep easy things intuitive, and to make hard things possible. Having already developed a number of packages which address other steps in the data processing and analysis pipeline, he chose to address the process by which R developers ingest, manipulate, and transform data. This package, dplyr, is a narrowly focused library, designed to provide 90% of the functionality required in most data munging tasks. It may not solve every possible problem in data munging, but dplyr’s built in primitives will reduce your need to drop to “base R” for most common tasks. It does so by leveraging the magritr package’s “pipe” operator, and is inspired by dataflow programming constructs. It is fast, it is powerful, and it has been the single largest productivity boost that many of us here at DataScience.LA have adopted in 2014. In this video, Hadley provides a very quick introduction to the dplyr package, describing his philosophy behind the package’s design while providing a number of illustrative examples as to the flexibility, expressiveness, and power of this tool.
Since this is such an awesome and productivity-enhancing tool, we will also be releasing a series of useR! 2014 tutorial videos, a guided hands-on session on dplyr, over the next two weeks so stay tuned. I hope you enjoy!