Which R packages do you consider the most helpful and essential for undergrad stat ed? I ask in great part because it would help my local IT guru set up the way our network makes software available in our computer classrooms, but also just from curiosity.
Doug asked for a top 10 list, and a few people have already chimed in with great suggestions. I thought those not on the list might also have good ideas, so, with Doug’s permission, I’m reposting the question here.
Here is my top 10 (ok, 12) list:
(Links go to vignettes or pages I find to be quickest / most useful references for those packages, but if you know of better resources, let me know and I’ll update.)
rmarkdown– for reproducible data analysis with literate programming, great set of tools that students can use from day 1 in intro stats all the way through to writing their undergrad theses
dplyr– for most data manipulation tasks, with the added benefit of piping (via magrittr)
ggplot2– easy faceting allows for graphing multivariate relationships more easily than with base R (lattice is also good for that, but IMO ggplot2 graphics look more modern and lattice has a much steeper learning curve)
openintro– or packages that come with the textbooks you use, great for pulling up any dataset from the text and building on it in class (a new version coming soon to fully complement 3rd edition of OpenIntro Statistics)
mosaic– for consistent syntax for functions used in intro stat
googlesheets– for loading data directly from Google spreadsheets
lubridate– if you ever need to work with any date fields
stringr– for text parsing and manipulation
rvest– for scraping data off the web
data.table– for loading large datasets & default
stringsAsFactors = FALSE
And the following suggestions from Randall Prium complement this list nicely:
readxl– for reading Excel data
tidyr– for converting between wide and long formats and for the very useful
ggplot2“done right” and tuned for interactive graphics
htmlwidgets– this is actually a collection of packages for plots: see
leafletfor maps and
dygraphsfor time series, for example
Note that most of these packages are for data manipulation and visualization. Methods specific packages that are useful / essential for a particular undergraduate program might depend on the focus of that program. Some packages that so far came up in the discussion are:
This blog post is meant to provide a space for continuing this discussion, so I’ll ask the question one more time: Which R packages do you consider the most helpful and essential for undergrad stat ed? Please add your responses to the comments.