November Package Picks
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
by Joseph Rickert
November was a prolific month for R developers: 189 new packages landed in CRAN. I have selected more than a quarter of them for this post, but I haven’t listed everything that is worth a look. My November 2016 picks are organized into four categories: Biotech (4 picks), Data (6 picks), Machine Learning (9 picks) , Statistics (9 picks), Time Series (4 picks) and Utilities (20 picks). The relatively large number of Utilities packages listed seriously over-represents this category. However, I have included so many to emphasize the cumulative impact of developers working to improve the R ecosystem at a fairly low level. Also, I believe that these sorts of packages are relatively difficult to discover.
The packages listed under this heading support analyses in biostatistics, genetics and medicine.
- esaddle v0.0.2: Provides functions for fitting the Extended Empirical Saddlepoint (EES) density. The vignette provides examples.
- incidence v1.0.1: Provides functions and classes to compute, manipulate and visualize incidence from dated events for defined time intervals. There are three vignettes including this overview.
- speaq2 v0.1.0: Provides wavelet-based tools for the analysis of NMR spectra. The vignette shows how to process a data set.
- starmie v0.1.2:Provides data structures and methods for manipulating the output of genetic population structuring algorithms. There is a Basic Usage vignette, as well as one on Admixture models.
The packages here provide access to data through various methods.
- ALA4R v1.5.3: Provides an interface to the Atlas of Living Australia (ALA) that allows users to access and visualize data on Australian plants and animals. The vignette shows how to use it.
- BatchGetSymbols v1.0: Makes it easy to download a large amount of trade data from Yahoo or Google Finance. There is a brief vignette.
- elasticsearchr v0.1.0: Provides a lightweight interface to Elasticsearch, a NoSQL search engine and column store database. The vignette provides details.
- hansard v0.2.5: Provides functions for downloading data using the UK Parliament API. The vignette describes how to access information on individual members of parliament, briefings and more.
- isdparser v0.1.0: Provides tools for parsing NOAA Integrated Surface Database (ISD) files. The vignette gives the basics on reading and parsing the files.
- RBMRB v2.0: Provides an interface to the Biological Magnetic Resonance Data Bank (BMRB), along with tools for NMR images. Look here for documentation.
The packages listed here are geared towards machine-learning applications.
- bcROCsurface v1.0-1: Offers functions to compute bias-corrected estimates of ROC curves. There is a guide.
- BiBitR v0.1.0: raps the Java BiBIt biclustering algorithm for extracting bit-patterns from binary data sets. See the Bioinformatics paper for details.
- cleanNLP v0.24: Provides a Tidy Data model based on dplyr for converting a textual corpus into a set of normalized tables. The underlying NLP pipeline is based on Stanford’s CoreNLP library.
- ffstream v0.1-5: Provides an implementation of the adaptive forgetting factor algorithm for estimating the mean and variance of a data stream in order to detect multiple checkpoints. The details are in the vignette.
- FTRLProximal v0.1.2: Implements the Regularized Leader Proximal algorithm for online training of large-scale regression models using a mixture of L1 and L2 regularization
- IDmining v1.0.0: Contains functions for mining large high-dimensional data sets using the Intrinsic Dimension technique. This paper describes the idea.
- mltools v0.1.0: Provides a collection of machine learning helper functions for exploratory analysis. The README file provides some details.
- OpenML v1.1: Provides an interface to the OpenML online machine-learning platform. The vignette provides an example of how to use it.
- rucrdtw v0.1.1: Provides R bindings for functions from the UCR Suite (Rakthanmanon et al. 2012), which enables ultrafast subsequence search under both Dynamic Time Warping and Euclidean Distance. The vignette shows how to use the package.
The packages listed under this heading mostly offer algorithms to support statistical analyses. Notable are queuecomputer, which implements a discrete event simulation, and regtools, which could have also been listed under the Machine Learning heading.
- bayesplot v1.0.0: Provides plotting functions for posterior analysis, model checking, and MCMC diagnostics. There are vignettes for MCMC diagnostics, plotting MCMC draws, and graphical posterior checks.
- eMLEloglin v1.0.1: Provides functions for fittlin log-linear models of sparse contingency tables. See the user manual for the math.
- POT v1.1-6 : Implements functions to perform Peaks Over Threshold analysis, useful in Extreme Value Theory. The vignette explains the math.
- queuecomputer v0.5.1: Provides computationally efficient solutions for simulating queues with arbitrary arrival and service times. There is a vignette describing how to use the package and one showing how to simulate M/M/k queues.
- regtools v1.0.1: Provides novel tools for linear and nonlinear regression, and nonparametric regression and classification. The vignette contains examples.
- revdbayes v1.0.0: Provides functions for the Bayesian analysis of extreme value models. The vignette contains several interesting examples and references.
- slim v0.1.0: Provides functions to fit singular linear models to longitudinal data. The theory is described in this Biometrika paper, and the vignette provides examples.
- varband v0.9.0: Implements the variable banding procedure described in a paper by Yu and Bien for modeling local dependence and estimating precision matrices. The vignette shows how to use the package. The following plot shows the sparsity patterns of the true model, and the sample covariance matrix for one of the examples.
- xyz v0.1: Implements an algorithm by Thanei, Meinshausen and Shah for finding strong interactions in almost linear time. The vignette contains an example.
The packages listed here explicitly call out time series applications.
- GeomComb v1.0: Provides an eigenvector-based method for combining time series forecasts.
- ptest v1.0-8: Implements p-value computations for testing periodicity in short time series. The vignette provides examples and references.
- tsdisagg2 v0.1.0: Provides functions to disaggregate low frequency time series data to higher frequency series. The vignette describes the math and provides references.
- zoocat v0.2.0: Extends the zoo class and provides tools for manipulating multivariate time series data. The vignette contains an example.
The packages listed here are a varied collection of convenience utilities, package extensions, gateways to other software, and low-level computing functions. Notable are flock and subprocess, which feel like systems-level programming.
- batchtools v0.9.0: As a successor of the packages ‘BatchJobs’ and ‘BatchExperiments’, this package provides a parallel implementation of the Map function for high-performance computing systems managed by schedulers such as ‘IBM Spectrum LSF’ (), ‘OpenLava’ (), and others. There are four vignettes, including this one on error handling.
- benchr v0.1.0: Provides infrastructure to accurately measure and compare the execution times of R expressions. Usage is described here.
- bindr v0.1: Provides an interface for creating active binding where the bound function accepts additional arguments. Usage is described here.
- bytescircle v1.0: Shows statistics about bytes contained in a file as a circle graph of deviations from mean in sigma increments. The following plot from the vignette shows byte values mapped as an archimedean spiral, where each byte value is represented as a color circle and size indicates the deviation from sigma.
- crul v0.1.0: Implements a simple HTTP client for making HTTP requests. Look at the GitHub README file for information on where to start.
- datapasta v1.0.0: Provides three addins for copying and pasting tables and vectors from Excel, Jupyter, and websites into the RStudio editor. The vignette provides an example.
- debugme v1.0.1: Offers functions to specify debugging messages as special string constants, and control package debugging via environment variables. Look here for an example.
- errorizer V0.1.1: Creates “errorized” versions of existing R functions with enhanced capabilities for logging and error handling. The vignette provides an example and describes the limitations of the method.
- fauxpas v0.1.0: Provides methods for general-purpose HTTP error handling. Integrates with packages crul, curl and httr. Look here for crul and curl examples.
- flock v0.7: Nitty-gritty package that implements synchronization between R processes using file locks.
- ggforce v0.1.1: Offers new stats and geoms to be used with ggplot2. The vignette provides several examples.
- ggstance v0.3: an extension to ggplot2 that provides flipped components and horizontal versions of stats and geoms. The package README file contains examples.
- naptime v1.2.0: Provides a “near drop-in” replacement for base::Sys.sleep() that allows for more control of delays. The vignette explains the why and how of napping.
- packagedocs v0.4.0: Should make writing package vignettes a little easier by providing functions for building websites of Package Documentation. See the quick start manual and package reference manual.
- rly v1.0.1: Another nitty-gritty package, it provides an R implementation of the parsing tools lex and yacc. See the README file for examples.
- spark.sas7bdat V1.0: Allows R users to read SAS data files into Spark. The vignette indicates how to read SAS data in parallel.
- startup v0.3.0: Provides new directories to specify R’s startup configuration that makes it possible to keep private / secret variables separate from other environment variables. See the README file.
- subprocess v0.7.4 : Brings systems-level programming to R, with the capability to create, interact with and control the life cycle of child processes. The vignette shows how.
- tesseract v1.2: Allows text to be extracted from an image. This is an OCR engine with unicode (UTF-8) support that can recognize over 100 languages. See the README file.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.