Feb 2018: “Top 40” New Package Picks
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Here are my picks for the “Top 40” packages of the 171 new packages that made it to CRAN (and stuck) in February, organized into the following categories: Computational Methods, Data, Finance, Science, Statistics, Time Series, and Utilities.
Computational Methods
adnuts v1.0.0: Provides an implementation of the no-U-turn (NUTS) algorithm by Hoffman and Gelman (2014) for ADMB
and TMB
models. The vignette will get you started.
CholWishart v0.9.2: Provides functions to sample from the Cholesky factorization of a Wishart random variable, the inverse Wishart distribution and the Cholesky factorization of an inverse Wishart random variable. See the vignette for details.
particles v0.2.1: Provides functions to simulate particle movement in 2D space using the ideas behind the ‘d3-force’ JavaScript particles
library. It implements all forces defined in d3-force
, as well as others such as vector fields, traps, and attractors. The vignette explains how to use the package.
rosqp v0.1.0: Provides bindings to the OSQP
solver, which can solve sparse convex quadratic programming problems with optional equality and inequality constraints.
SolveLS v1.0: Implements methods including Jacobi, Gauss-Seidel, Successive Over-Relaxation, SSOR and non-stationary, Krylov subspace methods. See this book for details.
Data
Cluster.OBeu v1.2.1: Provides functions to estimate and return the needed parameters for visualizations designed for OpenBudgets data. There is a vignette for Using Cluster.OBeu with OpenCPU and one for Cluster analysis.
photobiologySun v0.4.0: Contains data for extraterrestrial solar spectral irradiance and ground-level solar spectral irradiance and irradiance. See Aphalo P. J. (2015) and the User Guide for more information.
SympluR v0.3.0: Provides functions to analyze data from the Healthcare Social Graph via access to the Symplur API. Look here for related research articles.
totalcensus v0.3.0: Allows users to download summary files from the Census Bureau and extract data – in particular, high resolution data at block, block group, and tract level – from decennial census and American Community Survey 1-year and 5-year estimates.
Finance
estudy2 v0.8.4: Implements event study models, including rate-of-return estimation and classical models. Tests include those proposed by [Brown and Warner (1980)](doi:10.1016⁄0304-405X(80)90002-1], [Brown and Warner (1985)](doi:10.1016⁄0304-405X(85)90042-X], [Boehmer et al. (1991)](doi:10.1016⁄0304-405X(91)90032-F>] and more. The vignette provides an introduction.
Machine Learning
DALEX v0.1.1: Provides various explainers that help to understand the link between input variables and model output in machine learning models. See this website for explanations.
forestControl v0.1.1: Allows approximate false positive rate control in selection frequency for random forest using the methods described by Konukoglu and Ganz (2015).
kmed v0.0.1: Implements the distance-based k-medoids clustering algorithm from Park and Jun (2009). Cluster validation applies bootstrap procedure producing a heatmap with a flexible reordering matrix algorithm. There is a vignette.
lolR v1.0.1: Implements optimal low-rank projection algorithms to obtain a lower-dimensional representation of data before applying supervised learning techniques in situations where the dimensionality exceeds the sample size. There are several vignettes including: Class Condidtional PCA, Low-Rank Canonical Correlation Analysis, and HDLSS Simulations.
projpred v0.7.0: Provides functions to perform projection predictive feature selection for generalized linear models; see, for example, Piironen and Vehtari (2017). The package is compatible with rstanarm
. There is a Quick Start Guide.
RGF v1.0.1: Implements a wrapper for the python package Regularized Greedy Forest
. It also includes a multi-core implementation called FastRGF.
Science
cRegulome v0.1.1: Provides functions to build a SQLite
database file of pre-calculated transcription factor/microRNA-gene correlations (co-expression) incancer from the Cistrome and miRCancerdb
databases. There is an Introduction and a Case Study.
CENFA v0.1.0: Provides tools for climate- and ecological-niche factor analysis of spatial data, including methods for visualization of spatial variability of species sensitivity, exposure, and vulnerability to climate change. See Hirzel et al. (2002) and Basille et al. (2008). The vignette introduces the package.
detectRUNS v0.9.5: Provides functions to detect runs of homozygosity and of heterozygosity in diploid genomes using the sliding windows ( Purcell et al (2007) ) and consecutive runs ( Marras et al (2015) ) methods. The vignette provides an overview.
Statistics
cosa v1.1.0: Implements generalized constrained optimal sample allocation framework for two-group multilevel regression discontinuity studies and multilevel randomized trials with continuous outcomes. There is a short Tutorial.
DirectEffects v0.1: Provides functions to estimate the controlled direct effect of treatment fixing a potential mediator to a specific value. Implements the sequential g-estimation estimator described in Vansteelandt (2009) and Acharya et al. (2016). The vignette introduces the package.
dnr v0.3.2: Provides functions to fit temporal lag models to dynamic networks built on top of exponential random graph models (ERGM) framework. The vignette describes the method.
geozoning v1.0.0: Provides a zoning method and a numerical criterion for assessing zoning quality. There are vignettes on Geozoning Structures and Simulated Data.
GpGp v0.1.0: Provides functions for Gaussian process predictions and conditional simulations, along with covariance functions for spatial and spatial-temporal data on Euclidean domains and spheres. The original approximation is due to Vecchia (1988), and the reordering and grouping methods are from Guinness (2018). The vignette contains an example using wind speed.
idealstan v0.2.7: Offers item-response theory (IRT) ideal-point scaling/dimension reduction methods that incorporate additional response categories and missing/censored values. Full and approximate Bayesian inference is done via the Stan engine. There is an Introduction and a vignette on Evaluating Models.
kdensity v1.0.0: Provides methods for univariate non-parametric density estimation with parametric starts and asymmetric kernels. See Chen (2000), Chen (1999), and Jones & Henderson (2007). There is a Tutorial.
NetLogoR v0.3.2: Provides functions to create agent-based models in R following the NetLogo
framework. See Wilensky (1999). The NetLogo
models Ants and Wolf-Sheep-Predation have been translated in R. See the Programming Guide and Data Dictionary.
riskyr v0.1.0: Provides functions to express risk-related information in terms of probabilities or frequencies to make the teaching and training of risk literacy more transparent. There is a User Guide and Quick Start Primer, along with vignettes on Data Formats, the Confusion Matrix and Metrics, and Functional Perspectives.
rsimsum v0.3.0: Provides functions to summarize results from simulation studies and compute Monte Carlo standard errors of commonly used summary statistics. This package is modeled on the simsum
user-written command in Stata
. There is an Introduction and vignettes on Visualization,
Simulating a simulation study, and rsimsum and the tidyverse.
SimCorrMix v0.1.0: Provides functions to generate continuous (normal, non-normal, or mixture distributions), binary, ordinal, and count (regular or zero-inflated, Poisson or Negative Binomial) variables with a specified correlation matrix, or one continuous variable with a mixture distribution. This package can be used to simulate data sets that mimic real-world clinical or genetic data sets (i.e., plasmodes, as in Vaughan et al. (2009). There are vignettes on Continuous Mixture Distributions, Expected Cumulants and Correlations for Continuous Mixture Variables, Comparison of Correlation Methods, Variable Types, and Overall Workflow for Generation of Correlated Data
tree.bins v0.1.0: Allows users to recategorize the factors variables through a decision tree method derived from the rpart()
function of the rpart
package. For details, see Hastie et al (2009) and the vignette.
Time Series
segclust2d v0.1.0: Provides two methods for segmentation and joint segmentation/clustering of bivariate time-series. The segmentation method is a bivariate extension of Lavielle’s method available in adehabitatLT
Lavielle (1999) and Lavielle (2005). The segmentation/clustering method is an extension of Picard et al (2007). The vignette contains several examples.
tstools v0.3.6: Provides functions to plot official statistics time series with automatic legends, highlight windows, stacked bar chars with positive and negative contributions, and other options. It includes a fast, data.table
backed time series I/O that allows the user to export / import long format, wide format, and transposed wide format data to various file types. See the vignette for details.
Utilities
codemetar v0.1.5: Provides utilities to generate, parse, and modify codemeta.json
files automatically for R packages, as defined in the Codemeta Project. There is an Introduction to the Codemeta Project, and vignettes on Translating Between Data Formats, Validating JSON-LD, and Examples.
knitrProgressBar v1.1.0: Provides a progress bar similar to dplyr
that can write progress out to a variety of locations, including stdout()
, stderr()
, or from file()
. There is an Example and a vignette for setting up.
msgpack v1.0: Implements a fast, C-based encoder and streaming decoder for the messagepack
data format.
pmatch v0.1.3: Implements type constructions and pattern matching. See the README for details.
shinyalert v1.0: Provides functions to create pretty popup messages (modals) in Shiny
that may contain text, images, OK/Cancel buttons, an input to get a response from the user, and many more customizable options.
trackr v0.7.5: Provides functions to automatically annotate R-based artifacts with relevant descriptive and provenance-related notes, and provides a back-end-agnostic storage and discoverability system for organizing, retrieving, and interrogating such artifacts. There is an Introduction and a vignette on Extending trackr.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.