August 2019: “Top 40” R packages
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Two hundred and twenty-seven new packages made it to CRAN in August. Quite a few were devoted to medical or genomic applications, and this is reflected in my “Top 40” selections, listed below in nine categories: Computational Methods, Data, Genomics, Machine Learning, Medicine and Pharma, Statistics, Time Series, Utilities, and Visualization.
Computational Methods
fmcmc v0.2-0: Provides a flexible Markov Chain Monte Carlo (MCMC) framework for implementing Metropolis-Hastings algorithms. Thee is a vignette on user-defined kernels and another on workflows.
Mercator v0.9.5: Defines the classes used to explore, cluster, and visualize distance matrices, especially those arising from binary data. See the vignette.
tdigest v0.3.0: Implements the t-Digest construction algorithm by Dunning et al. (2019), which uses a variant of one-dimensional k-means clustering to produce a very compact data structure that allows accurate estimation of quantiles.
Data
arcos v0.8.2: Implements a wrapper for the ARCOS API that returns raw and summarized data frames from the Drug Enforcement Administration’s Automation of Reports and Consolidated Orders System, a database that monitors controlled substances transactions between manufacturers and distributors. There are vignettes on annual-maps, county-analysis, and per-capita-pharmacies.
censusxy v0.1.2: Provides access to the U.S. Census Bureau’s API for batch geocoding American street addresses. See the vignette.
hdfqlr v0.6-1: Implements an interface to HDFql along with helper functions for reading data from and writing data to HDF5
files. For more information, see the reference manual and the vignettes on Benchmarks, Low-level API, and Quick Start.
nhdplusTools v0.3.8: Implements tools documented by the US Environmental Protection Agency for traversing and working with [National Hydrography Dataset Plus](https://www.epa.gov/waterdata/nhdplus-national-hydrography-dataset-plus#targetText=National%20Hydrography%20Dataset%20Plus%20(NHDPlus,with%20the%20U.S.%20Geological%20Survey.) (NHDPlus) data. There is an Introduction and vignettes on plotting and point indexing.
Genomics
getspres v0.1.0: Implements the SPRE (standardized predicted random-effects) statistics to explore heterogeneity in genetic association meta-analyses, as described by Magosi et al. (2019). Look here for a very brief overview and see the vignette for a Tutorial.
simGWAS v0.2.0-2: Provides functions to simulate output from a case-control genome-wide association study (GWAS) with a given causal model. See Fortune and Wallace (2019) for the science, and the vignette for a simulation walk through.
viromeBrowser v1.0.0: Facilitates browsing virome sequencing using annotations in multiple fasta files, and allows users to select and export specific annotated sequences. The vignette shows how to use the package.
whoa v0.0.1: Provides functions to investigate the distribution of genotypes in genotype-by-sequencing (GBS) data where approximate Hardy-Weinberg equilibrium is expected, in order to assess rates of genotyping errors and the dependence of those rates on read depth. See Hendricks et al. (2018) for background and the vignette for a Tutorial.
Machine Learning
flashlight v0.2.0: Provides functions to examine black-box machine-learning models using permutation variable importance ( Fisher et al. (2018) ), ICE profiles, and partial dependence ( Friedman J. H. (2001) ). See the vignette.
imagefx v0.2.0: Provides functions to extract features from images for time-series analysis or machine-learning applications. There is a vignette for analysing video data and another for optical flow analysis.
rTorch v0.0.3: Provides an interface to the Python
-based PyTorch
machine-learning library. See the README for how to use the interface.
Medicine and Pharma
accept v0.7.0: Provides functions to allow clinicians to predict the rate and severity of future acute exacerbation in Chronic Obstructive Pulmonary Disease (COPD) patients, based on the clinical prediction model published in Adibi et al. (2019). The webapp shows the model.
DRAFT v0.3.0: Fits epidemic data to stochastic models with constant or time-dependent behavior. See Ben-Nun et al. (2019) for a case study, and the vignette for other examples.
getspres v0.1.0: Implements the SPRE (standardized predicted random-effects) statistics to explore heterogeneity in genetic association meta-analyses, as described by Magosi et al. (2019). Look here for a very brief overview and see the vignette for a Tutorial.
idmodelr v0.3.1: Implements a framework that includes simulation and visualization tools for exploring a range of infectious disease models. It is primarily intended as an educational resource. There are vignettes for Model details, Parameter details, and Other resources.
OncoBayes2 v0.4-4: Implements a Bayesian logistic regression model with optional EXchangeability-NonEXchangeability parameter modelling, and includes a safety model that can guide dose-escalation decisions for adaptive oncology Phase I dose-escalation trials involving an arbitrary number of drugs. See Neuenschwander et al. (2008) and Neuenschwander et al. (2016), and the vignette for examples.
PML v1.1: Implements a penalized multi-band learning algorithm to analyze circadian rhythms from accelerometer data. See the vignette for an example.
xgxr v1.0.2: Provides functions to support a structured approach for exploring PKPD data. There is an Overview, a PKPD Single Ascending Dose example, and a vignette on PK Exploration with nlmixr dataset for theophylline.
visit v2.1: Implements a Bayesian Phase I cancer vaccine trial that allows for the simultaneous evaluation of safety and immunogenicity outcomes in the context of vaccine studies. See Wang (2019) for the details of the trial design, and the package overview.
Statistics
baggr v0.1.0: Provides function to fit and compare hierarchical Bayesian meta-analysis models with Stan
. See the vignette.
BayesPostEst v0.0.1: Provides functions to generate and plot post-estimation quantities after estimating Bayesian regression models using Markov chain Monte Carlo (MCMC), including Precision-Recall curves (see Beger (2016)) and predicted probabilities, using the methods of Hanmer and Kalkan (2013) and King et al. (2000). The functions can be used with MCMC output generated by any Bayesian estimation tool, including JAGS
, BUGS
, MCMCpack
, and Stan
. See the Getting Started Guide.
cotram v0.1-0: Provides functions to implement count transformation models featuring parameters interpretable as discrete hazard ratios, odds ratios, reverse-time discrete hazard ratios, or transformed expectations. For the technical details, see Hothorn et al. (2018) and the vignette.
lax v1.0.0: Provides functions to adjust the standard errors for extreme-value models fitted with evd
, evir
, extRemes
, fExtremes
, ismev
, POT
, and texmex
. See the vignette for an overview.
OwenQ v 1.0.2: Implements the Owen Q-function for integer value degrees of freedom, which is useful for calculating the power of equivalence tests. There is a vignette on the Owen Cumulative Function and another on Validation.
Time Series
avar v0.1.0: Implements the allan variance and allan variance linear regression estimator for latent time series models. For the theory, see Guerrier et al.(2016). The vignette contains examples.
feasts v0.1.1: Provides a collection of functions for producing decompositions, statistical summaries, and plots for analyzing tidy time series data. See the vignette for examples.
scorpeak v0.1.2: Provides functions for detecting peaks in time series based on the algorithms described in Girish Palshikar (2009). The vignette contains examples.
TSplotly v1.1.1: Provides functions to create interactive time series plots. See the vignette for an introduction to the package.
Utilities
arrow v0.14.1.1: Provides an interface to the Apache Arrow C++
library. Arrow
is a cross-language development platform for in-memory data, which specifies a standardized language-independent columnar memory format.
butcher v0.1.0: Provides S3 generics to axe components of fitted-model objects to reduce the size of model objects saved to disk. There is an introduction and additional vignettes on adding-models-to-butcher and available-axe-methods.
mRpostman v0.2.0: Provides functions to make it easy to connect to your IMAP ( Internet Message Access Protocol ) server and execute commands such as list mailboxes, search for, and fetch messages in a tidy way. See the vignette for details.
pins v0.1.2: Provides a way to “pin”” remote resources into a local cache to work offline, improve speed, and avoid recomputing. Resources can be anything from CSV
, JSON
, or image files to arbitrary R
objects. There is abundant documentation, including a Getting Started Guide and several vignettes: Extending Board, Using GitHub Boards, Using Kaggle Boards, Using RStudio Connect Boards, Understanding Boards, Using Website Boards, Extending Pins, and Using Pins in RStudio.
pmdplyr v0.3.0: Extends dplyr
to provide a family of functions for manipulating panel data, including functions to manipulate data based on index variables. There are vignettes on dplyr Variants, Panel Tools, and Panel Maneuvers in dplyr.
tidycells v0.2.1: Provides utilities to read cells from complex tabular data and, using a heuristic method, assign those cells to a columnar or tidy format. See the vignette for an overview.
Visualization
ggpointdensity v0.1.1: Extends ggplot2
to provide a geom for point-density plots, which are a cross between 2D density plots and scatter plots. Look here for examples.
hpackedbubble v0.1.0: Provides a simple way to draw split-packed bubble charts based on Highcharts
. See the vignette for examples.
SHAPforxgboost v0.0.2: Provides functions to aid in visual data investigations using SHAP (SHapley Additive exPlanation) visualization plots for XGBoost
. Look here for examples.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.