Two hundred and twenty-four new packages were added to CRAN in July. Below are my picks for the “Top 40” packages arranged in eight categories: Machine Learning, Science, Statistics, Numerical Methods, Statistics, Time Series, Utilities and Visualizations. Science and Numerical Methods are categories that I have not used before. The idea behind the Science category is to find a place for packages that appear to have been created with some particular scientific investigation or problem in mind. The Numerical Methods category is reserved for packages that, while they may be targeted to some general form of statistical analysis, emphasize numerical considerations and carefully constructed algorithms.
As always, my selections are heavily weighted by the availability of documentation beyond what is included in the package PDF. I rarely select packages that do not have a vignette or some other source of documentation about how the package can be used, for example, README files or a referenced URL. I almost never select “professional” packages, which I define as packages that are devoted to esoteric topics that either include no documentation beyond the PDF, or exclusively refer to papers that are protected by a paywall. While these packages usually comprise serious, valuable contributions to R, they also appear to have been written for very small audiences.
Finally, before listing this month’s Top 40, I would like to call attention to an awesome display of productivity by Kevin R. Coombes, who had fourteen packages on various topics accepted by CRAN in July: BimodalIndex, ClassComparison, ClassDiscovery, CrossValidate, GeneAlgo, IntegIRTy, Modeler, NameNeedle, oompaBase, oompaData, PreProcess, SIBERG, TailRank and Umpire.
The July 2017 Top 40
grf v0.9.3: Provides methods for non-parametric least-squares regression, quantile regression, and treatment effect estimation (optionally using instrumental variables).
keras v2.0.5: Implements an interface to Keras, a high-level neural networks API that runs on top of TensorFlow. There is an Overview of the Keras backend, and a number of vignettes including Keras Layers, Writing Custom Keras Layers, Keras Models, Using Pre-Trained Models, Sequential Models and more.
sgmcmc v0.1.0: Provides functions to implement stochastic gradient Markov chain Monte Carlo (SGMCMC) methods for user-specified models. TensorFlow is used to calculate the gradients. There is a Getting Started Guide and vignettes for Simulating from a Gaussian Mixture, a Multivariate Gaussian Mixture and Logistic Regression.
mcMST v1.0.0: Provides algorithms to approximate the Pareto-front of multi-criteria minimum spanning tree problems, along with a toolbox for generating multi-objective benchmark graph problems. There is an Introduction and a vignette on benchmarking optimization problems.
mize v0.1.1: Provides optimization algorithms, including conjugate gradient (CG), Broyden-Fletcher-Goldfarb-Shanno (BFGS), and the limited memory BFGS (L-BFGS) methods. There is an introduction and vignettes on Convergence, Metric MDS, and Stateful Optimization.
SuperGauss v1.0: Provides a fast C++ based algorithm for the evaluation of Gaussian time series, along with efficient implementations of the score and Hessian functions. The vignette shows an example of inference for the Hurst parameter.
noaastormevents v0.1.0: Allows users to explore and plot data from the National Oceanic and Atmospheric Administration (NOAA) Storm Events database for United States counties through R. There is an Overview and a vignette providing details.
diffpriv v0.4.2: Provides an implementation of major general-purpose mechanisms for privatizing statistics, models, and machine learners, within the framework of differential privacy of Dwork et al. (2006). There is a vignette on The Bernstein Mechanism and an Introduction
fence v1.0: Implements a new class of model-selection strategies for mixed-model selection, which includes linear and generalized linear mixed models. The idea involves a procedure to isolate a subgroup of what are known as correct models (of which the optimal model is a member). The package points to several references in the literature including papers by Jiang et al. 2008, Jiang et al. 2010, Jiang et al. 2011, Nguyen et al. 2012, and Jiang 2014.
llogistic v1.0.0: Provides density, distribution, quantile and random generation functions for the L-Logistic distribution with parameters
m (median) and
metaBMA v0.3.9: Provides functions to compute the posterior model probabilities the meta-analysis models assuming either fixed or random effects. See the paper by Gronau et al. and the vignette for details.
MFKnockoffs v0.9: Provides functions to create model-free knockoffs, a general procedure for controlling the false discovery rate FDR when performing variable selection. There are vignettes on using the the Model-Free Knockoff Filter Basic and Advanced, and Using the Filter with a Fixed Design Matrix.
msde v1.0: Implements an MCMC sampler for the posterior distribution of arbitrary, time-homogeneous, multivariate stochastic differential equation (SDE) models with possibly latent components. There is a vignette with Sample Models and another for Inference.
RBesT v1.2-3: Provides a tool set to support Bayesian evidence synthesis, including meta-analysis, prior derivation from historical data, operating characteristics, and analysis. There is an Introduction and vignettes on Customizing Plots, Normal Endpoints, and Robust Priors.
RcppTN v0.2-1: Provides R and C++ functions to generate random deviates from and calculate moments of a Truncated Normal distribution using the algorithm of Robert (1995). There is a vignette showing how to use the package, and one for Performance.
SMM v1.0: Provides functions to simulate and estimate of Multi-State Discrete-Time Semi-Markov and Markov Models. The implementation details are described in two papers by Barbu, Limnios one and two, and one paper by Trevezas and Limnios. The vignette also provides considerable detail.
treeDA v0.02: Provides functions to perform sparse discriminant analysis on a combination of node and leaf predictors, when the predictor variables are structured according to a tree. There is a vignette.
timetk v0.1.0: Implements a toolkit for working with time series, including functions to interrogate time series objects and tibbles, and coerce between time-based tibbles (‘tbl’) and ‘xts’, ‘zoo’, and ‘ts’. There is an Introduction and vignettes on Working with time series index, Making a Future Index, and Forecasting.
dataCompareR v0.1.0: Contains functions to compare two tabular data objects with the specific intent of showing differences in a way that should make it easier to understand the differences. The vignette shows how to use the package.
seplyr v0.1.4: Supplies standard evaluation adapter methods for important common
dplyr methods that currently have a non-standard programming interface. There is an Introduction, as well as vignettes for Using seplyr with dplyr, the operator named map builder, and the operator rename_se.
vetr v0.1.0: Provides a declarative template-based framework for verifying that objects meet structural requirements, and auto-composing error messages when they do not. There is a vignette on Alikeness and one on Trust, but Verify.
ggjoy v0.3.0: Joyplots provide a convenient way of visualizing changes in distributions over time or space.
ggjoy enables the creation of such plots in
ggplot2. There is an Introduction and a Gallery of examples.
loon v1.1.0: Is an extensible toolkit for interactive data visualization and exploration. There are two vignettes containing examples: Visible minorities in Canadian cities and Smoothers and Bone Mineral Density
tidygraph v1.0.0: A graph, while not “tidy” in itself, can be thought of as two tidy data frames describing node and edge data respectively.
tidygraph provides functions to manipulate these virtual data frames using the
dplyr package. Look here for some details.