by Joseph Rickert
In my August Package Picks post, I explained that my selection criteria favor packages with vignettes. (I find skimming through a package’s vignettes to be an effective method of “grokking” what a package is all about.) I also questioned why a person would go to all of the trouble to develop a package and put it on CRAN without writing a vignette. Since writing that post, I have had the opportunity to speak with experienced package authors who argue, with some considerable authority, that the object documentation (what you get when you type “?foo” at the console) and the README file comprise a package’s most important documentation.
This is undoubtedly true, and self-evident once a person has decided to use the package. It is also true that the R Community is diligent: people pay attention, and really useful packages seem to get discovered rather quickly in spite of CRAN’s low signal-to-noise ratio. Nevertheless, CRAN is poised to blow through the 10,000 package milestone sometime soon. Package discovery and audition is hard work. As a potential user of a given package, I very much appreciate and benefit from the elaboration of a package’s capabilities and the extended examples found in well-done vignettes, and I suspect that others do, too.
Of the 174 packages submitted to CRAN in October, I have picked out 22 that I thought were particularly interesting, and listed them below in four categories: Data, Machine Learning, Miscellaneous, and Statistics.
All of these packages enable access to data either by directly packaging up the data, or through functions to access data directly from a remote source, or via an API.
- rnoaa v0.6.5: Provides a client for several NOAA data sources, including the NCDC Climate API. There are vignettes for NCDC data, air quality and weather data, sea ice data, storms and more.
- ptwikiwords v0.0.3: Contains a dataset with 15,000 randomly extracted words from the Portuguese Wikipedia.
- qualtRics v0.1: Provides functions to pull online Qualtrics survey data into R. There is a short vignette.
- QuantTools v0.5.1: Provides functions to download and organize historical financial market data from Yahoo, Google, Finam and IQFeed.
- sofa v0.2.0: Provides an interface to the CouchDB NoSQL database. Get started with an Introduction and a Query Table.
- ubeR v0.1.3: Provides R access to the Uber API.
The two packages listed here should be helpful in common machine learning workflows.
- prediction v0.1.1: prediction::prediction() provides an alternative to predict() that always returns a data frame.
- preText v0.4.4: Provides functions to assess the effects of different text preprocessing decisions on the inferences drawn from the resulting document-term matrices they generate. There is a vignette to get you started.
The packages listed here cover a wide range of interests and capabilities: cryptography, flow charts, browser automation, discrete event simulation, and styling reports. This kind of diversity showcases R as a general-purpose programming language.
- gpg: v0.4: Provides bindings to GnuPG for working with OpenGPG (RFC4880) cryptographic methods, and includes utilities for public key encryption, creating and verifying digital signatures, and managing your local keyring. The vignette provides a short introduction to GPG.
- poio v0.0-1: Provides functions to manipulate the .PO and .POT files that R packages use to store translation messages, warnings, and errors. See Section 1.8 Internationalization of the Writing R Extensions manual for details. This package is a work product of the R-Consortium-funded project for R Localization (RLI0N)
- PRISMAstatement v1.0.1: Provides functions to plot PRISMA flow charts, which are used for systematic reviews and meta-analyses. The vignette provides a brief example.
- RSelenium v1.4.5: Provides R bindings for the Selenium 2.0 Web Browser Automation Software. There are several vignettes, including this one on RSelenium basics. You can use RSelenium to test your Shiny Apps.
- spaDES v1.3.1: Provides functions to implement a variety of discrete event simulation models in R, including event-based models and agent-based models. The package also provides plotting methods optimized for speed and modularity. Vignettes include an introduction, a manual for building models and a manual for plotting.
- tint v0.0.3: Provides a template for creating html reports according to the style of Edward R. Tufte and Richard Feynman, but with an updated font choice. Look here for explanation and motivation.
I believe one of the real strengths of R is that, in addition to developing new methods and algorithms, statisticians continue to write packages that enhance or improve basic calculations. The package system encourages “kaizen”, or continuous improvement.
- diagis v0.1.0: Provides functions to weighted means and sample covariances of multivariate samples, along with diagnostic plots. The motivation for the package was to provide summary statistics for the weighted MCMC runs computed by functions in the bssm package. The vignette for diagis describes the math.
- glm.predict v2.3-0: Provides functions to calculate predictions with confidence intervals for glm models. When two models are predicted, the differences between the upper and lower values for the respective confidence intervals are also calculated. This link provides examples.
- GPrank v0.1.1: Implements a Gaussian process (GP)-based ranking method that can be used to rank multiple time series according to their temporal activity levels. One example application is when gene expression levels are measured over time and the objective is to identify the most active genes. The vignette provides additional examples.
- MCMCvis v0.6.3: Contains functions to visualize, manipulate, and summarize MCMC output from Bayesian models fit with JAGS, Stan, or other MCMC samplers. There is a vignette with examples.
- mhtboost v1.3.3: Provides a framework for testing multiple hypotheses based in the differences in the distributions of the p values for the null and alternative hypotheses. The vignette describes the math.
- oddsratio v0.3.1: Provides odd ratio calculations for GAM(M) and GLM(M) models. The plot from the tutorial shows odds ratio information superimposed on a smoothed GAM fit.
- rust v1.0.1: Uses the generalized ratio-of-uniforms (RU) method to simulate from univariate and low dimensional multivariate continuous distributions. The detailed vignette explains the method and the math. The paper by Martin et al. provides context and extensions for the RU method.
- sf v0.2-2: Provides R support for simple features, a standardized way to encode spatial data. The vignette describes simple features and provides examples. This package is a product of the R-Consortium-funded project to develop Simple Features Access for R.
- Finally, a kind reader noticed that I missed at least one very useful package in my post about September’s new package submissions: anytime v0.1.0 converts variables of various types into POSIXct or date objects. See anytime’s README file for examples.
If you find that I have missed something important in one of my package review posts, please let me know.