by Joseph Rickert
Packages continue to flood into CRAN at a rate the challenges the sanity of anyone trying to keep up with what's new. So far this month, more than 190 packages have been added. Here is a my view of what's interesting in this March madness.
The launch_tutorial() function from the RtutoR package by Anup Nair launches a Shiny-based interactive R tutorial that, so far, includes sections on basic operations on a data set, data manipulation, loops and functions, and basic model development. The following screen shows the page for selecting columns from a data set. Notice that the example code offers two different dplyr based alternatives. The interface is far from perfect, but it's quite workable. Interactive tutorials launched directly from the command line may very well be the next generation of R documentation.
It also looks like the idea of using an R package to launch a shiny application may indicate a trend. The lavaan.shiny package by William Kyle Hamilton, also new this month, contains a single function to launch an interactive tutorial on latent variable analysis based on the lavaan package.
Time Series aficionados will want to have a look at the dCovTS package from Pitsillou and Foxianos which implements the distance covariance and correlation metrics for univariate and multivariate time series. These are relatively new metrics published by Z. Zhou in a 2012 paper in which he adapted the distance correlation metric developed by Szekely et al to measure non-linear dependence in time series. The following plots shows the data, ACF, PACF and Auto-Distance Correlation Function (ADCF) for a time series of monthly deaths from bronchitis, emphysema and asthma for makes in the UK between 1974 and 1979.
The ADCF plot produced by the function ADCFplot(mdeaths,method=”Wild”,b=100) uses the “Wild Bootstrap”, a relatively new re-sampling technique for stationary time series.
If you are working with generalized linear mixed models you may be interested in two new packages that provide a few enhancements for lme4. glmmsr by Helen Ogden provides some alternatives to the Laplace method for approximating likelihood functions (The vignette does a good job of explaining the new alternatives) and GLMMRR from Fox, Klotzke and Veen fits GLMM models to binary, randomized response data and provides Cauchit, Log-log, Logistic and Probit link functions.
Machine Learning enthusiasts may find a few new packages interesting. The MultivariateRandomForest package by Raziur Rahman contains functions to fit multivariate Random Forests models and make predictions. The hclust2() function in Gagolewski, Bartoszuk and Cena's genie package clusters data using the Gini index. hclust2() is a hierarchical clustering technique that is billed as being outlier resistant. The package kmlShape by Genolini and Guichard contains functions to do hierarchical clustering on longitudinal data using the Frechet's distance metric to group trajectories. The following plot shows clusters identified for the artificial data generated in example 2 for the kmlShape() function.
deepboost from Marcous and Sandbank provides and interface to google's Deep Boasting algorithm as described in this paper by Cortes et al. it provides functions for training, evaluation, predicting and hyper parameter optimizing using grid search and cross validation.
The last package I'll mention today is rEDM from Ye, Clark and Deyle that brings empirical dynamic modeling (EDM) to R. EDM uses time series data to reconstruct the state space of a dynamic system using the Takens’ Theorem (1981) which implies that the reconstruction can be accomplished using lags of a time series data for the unknown or unobserved variables. The vignette makes a nice case for why attractors and chaos belong in R.