Here are my “Top 40” picks from the two hundred or so new packages that stuck to CRAN in January, listed under seven categories: Data, Data Science, Science, Statistics, Time Series, Utilities and Visualizations (I say “stuck to” because I counted at least six packages that were accepted onto CRAN in January but removed within the month. Having packages quickly removed from CRAN is a phenomenon I have observed in recent months.)
While looking over the packages that I have listed under Data and Science, it struck me that in addition to being the world’s largest repository of statistical knowledge, CRAN is becoming a repository for practical, hard-won scientific knowledge.
elevatr v0.1.4: Provides access to several services offering elevation data, and returns the data either as a SpatialPointsDataFrame from point elevation services or as a raster object from raster elevation services. Currently, the package supports access to the Mapzen Elevation Service, Mapzen Terrain Service, Amazon Web Services Terrain Tiles, and the USGS Elevation Point Query Service. The vignette shows how to use the package.
fabricatr v0.2.0: Provides functions to simulate hierarchical and correlated data. There are several vignettes including a Getting Started guide and an Advanced Features guide, as well as introductions to Resampling and Generating Discrete Random Variables.
photobiologyFilters v0.4.4: Is a data-only package with spectral ‘transmittance’ data for frequently used filters and materials, including plastic sheets and films, optical glass and ordinary glass, and some labware. It complements the photobiology package. See this website and the vignette for details.
washdata v0.1.2: Provides access to the urban water and sanitation survey data set collected by Water and Sanitation for the Urban Poor (WSUP), with technical support from Valid International. There is a vignette.
CRPClustering v1.0: Provides a clustering method using the Chinese restaurant process Pitman (1995) that does not need to decide the number of clusters in advance. Also provides functions to calculate the ambiguity of clusters as entropy Yngvason (1999). The vignette shows how to use the package.
multiROC v1.0.0: Provides tools to solve problems with multiple classes by computing the areas under ROC curve via micro- and macro-averaging. The methodology is described in Van Asch (2013) and Pedregosa et al. (2011). See the vignette for a quick tour.
annovarR v1.0.0: Provides unctions and database resources to offer an integrated framework to annotate genetic variants from genome and transcriptome data. The wrapper functions unify the interface of many published annotation tools, such as VEP, ANNOVAR, vcfanno, and AnnotationDbi. There is an Introduction and a vignette on Databases.
pubh v0.1.7: Offers a toolbox for making R functions and capabilities more accessible to students and professionals from Epidemiology and Public Health related disciplines. There is an Introduction and a Regression Example.
dirichletprocess v0.2.0: Enables the creation of Dirichlet process objects that can be used as infinite mixture models. Examples include density estimation, Poisson process intensity inference, hierarchical modelling, and clustering. See Teh, Y. W. (2011) and the vignette for details.
detpack v1.0.1: Enables density estimation for possibly large data sets and conditional/unconditional random number generation with distribution element trees. For details on distribution element trees, see Meyer (2016), Meyer (2017), and Meyer (2017).
KRIG v0.1.0: Provides functions for Kriging models and various methods for spatial statistics, including multivariate sensitivity analysis using reproducing kernel Hilbert spaces and computation of Sobol indexes. There are vignettes on Ordinary Kriging, Simple Kriging, Universal Kriging, and a worked example.
OpVar v1.0: Provides functions for modeling operational (value-at-)risk, including loss frequencies and loss severities with plain, mixed (Frigessi et al. (2012)) or spliced distributions using Maximum Likelihood estimation and Bayesian approaches (Ergashev et al. (2013)). The vignette shows some examples.
netrankr v0.2.0: Implements methods for centrality-related analyses of networks, focusing on index-free assessment of centrality via partial rankings obtained by neighborhood-inclusion or positional dominance. See Schoch (2018). There are vignettes for benchmarks, centrality indices, indirect relations, neighborhood inclusion, partial centrality, positional dominance, probabilistic centrality, uniquely ranked graphs, and a use case.
palmtree v0.9.0: Implements the PALM tree algorithm, an extension to the MOB algorithm (implemented in the
partykit package), where some parameters are fixed across all groups. See Seibold et al. (2016) for details.
seminr v0.4.0: Implements a domain-specific language for building PLS structural equation models, allowing for the latest estimation methods for Consistent PLS as per Dijkstra & Henseler (2015), adjusted interactions as per Henseler & Chin (2010), and bootstrapping utilizing parallel processing as per Hair et al. (2017). There is a vignette.
santaR v1.0: Provides a graphical, automated pipeline for the analysis of short time series that has been designed to accommodate asynchronous time sampling, inter-individual variability, noisy measurements and large numbers of variables. There is a Getting Started Guide and vignettes on advanced command line functions, automated command line functions, plotting options, preparing input, selecting degrees of freedom, the theoretical background, and the GUI.
TSrepr v1.0.0: Provides methods for representations (e.g., dimensionality reduction, preprocessing, feature extraction) of time series. There is an Introduction to the Framework, a vignettes on representations, and a Use Case.
TSstudio v0.1.1: Provides a set of interactive visualization tools for time series analysis supporting ts, mts, zoo and xts objects including visualization functions for forecasting model performance (forecasted vs. actual), time series interactive plots (single and multiple series), and seasonality plots. The vignette shows the features available.
arrangements v1.0.2: Provides fast generators and iterators for permutations, combinations and partitions, allowing users to generate arrangements in a memory-efficient manner. Benchmarks may be found here.
rquery v0.3.1: Implements a query generator based on Edgar F. Codd’s relational algebra and operator names, which is aimed at enhancing the experience using ‘SQL’ at big-data scale. There is a vignette on the Assignment Partitioner and one on Query Generation.