Hadley Wickham, co-author (with Garrett Grolemund) of R for Data Science and RStudio's Chief Scientist, has focused much of his R package development on the un-sexy but critically important part of the data science process: data management. In the Tidy Tools Manifesto, he proposes four basic principles for any computer interface for handling data:
Reuse existing data structures.
Compose simple functions with the pipe.
Embrace functional programming.
Design for humans.
Those principles are realized in a new collection of his R packages: the tidyverse. Now, with a simple call to library(tidyverse) (after installing the package from CRAN), you can load a suite of tools to make managing data easier into your R session:
- readr, for importing data from files
- tibble, a modern iteration on data frames
- tidyr, functions to rearrange data for analysis
- dplyr, functions to filter, arrange, subset, modify and aggregate data frames
Installing the tidyverse package also installs for you (but doesn't automatically load) a raft of other packages to help you work with dates/time, strings, factors (with the new forcats package), and statistical models. It also provides various packages for connecting to remote data sources and data file formats.
Simply put, tidyverse puts a complete suite of modern data-handling tools into your R session, and provides an essential toolbox for any data scientist using R. (Also, it's a lot easier to simply add library(tidyverse) to the top of your script rather than the dozen or so library(…) calls previously required!) Hadley regularly updates these packages, and you can easily update them in your R installation using the provided tidyverse_update() function.
For more on tidyverse, check out Hadley's post on the RStudio blog, linked below.
RStudio Blog: tidyverse 1.0.0