Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
One Hundred seventy-six new packages made it to CRAN in May. Here are my Top 40 picks in eighteen categories: Climate Science, Computational Methods, Data, Decision Analysis, Ecology, Epidemiology, Finance, Genomics, Machine Learning, Medicine, Networks, Phylogenetics, Programming, Statistics, Time Series, Topological Data Analysis, Utilities, and Visualization.
Climate Science
forcis v1.0.1: Provides an interface to the FORCIS Working Group database Chaabane et al. (2024) on global foraminifera distribution. There are six vignettes, including Getting Started and Data Visualization.
< section id="computational-methods" class="level2">Computational Methods
dspline v1.0.2: Provides tools for computations with discrete splines, a class of univariate piecewise polynomial functions that are analogous to splines but for which smoothness is defined via divided differences rather than derivatives. Tools include discrete differentiation and integration and various matrix computations. See Tibshirani (2020) for the theory and README for an example.
RANSAC v0.1.0: Provides functions to fit both linear and non-linear models using the RANSAC (RANdom SAmple Consensus) algorithm, which is robust to outliers. See Fischler & Bolles (1981) for a description of the algorithm and the vignette for a a brief introduction.
wex v0.1.0: Provides functions to compute the exact observation weights for the Kalman filter and smoother based on the method described in Koopman and Harvey (2003) and supports in-depth exploration of state-space models. See README for examples.
< section id="data" class="level2">Data
CardioDataSets v0.1.0: Offers a diverse collection of datasets focused on cardiovascular and heart disease research, including heart failure, myocardial infarction, aortic dissection, transplant outcomes, cardiovascular risk factors, drug efficacy, and mortality trends. See the vignette.
NeuroDataSets v0.1.0: Offers a diverse collection of datasets focused on the brain, nervous system, and related disorders, including clinical, experimental, neuroimaging, behavioral, cognitive, and simulated data on conditions such as Parkinson’s disease, Alzheimer’s, epilepsy, schizophrenia, gliomas, and mental health. See the Introduction.
norSTR v0.2.1: Developed and maintained for use at the Department of Forensic Sciences, Oslo, Norway, the package provides allele frequency databases for 50 forensic short tandem repeat (STR) markers, covering Norway, Europe, Africa, South America, West Asia, Middle Asia, and East Asia. See README for an example.
< section id="decision-analysis" class="level2">Decision Analysis
aggreCAT v1.0.0: Implements mathematical aggregation methods for structured data elicitation, including those defined in Hanea, A. et al. (2021), to inform decision-making. See the Vignette.
< section id="ecology" class="level2">Ecology
ecorisk v0.1.1: Implements a modular framework for ecosystem risk assessments, combining existing risk assessment approaches tailored to semi-quantitative and quantitative analyses. See the vignette.
fastei v0.0.7: Provides functions to estimate the probability matrix for the R×C Ecological Inference problem using the Expectation-Maximization Algorithm with four approximation methods for the E-Step, and an exact method as well. See Thraves et. al (2024) for background and the vignette for examples.
fireexposuR v1.1.0: Provides methods for computing and visualizing wildfire ignition exposure and directional vulnerability. See Beverly et al. (2010) and Beverly and Forbes (2023) for background and methodology and the Introduction to get started.
SeaGraphs v0.1.2: Provides functions to transform sea current data to connectivity data. Two files of horizontal and vertical currents flows are transformed into connectivity data in the form of shapefile network. See Nagkoulis et al. (2025) for an application and the vignette for an introduction.
QuAnTeTrack v0.1.0: Provides a structured workflow for analyzing trackway data, facilitating the assessment of paleoecological and paleoethological hypotheses, and also provides functions for data digitization, loading, exploratory analysis, statistical testing, simulation, similarity assessment, intersection detection, and clustering. See Alexander 1976 and Rohlf (2009) for background and the vignette for examples.
< section id="epidemiology" class="level2">Epidemiology
DESA v1.0.0: Provides a framework for early epidemic detection through school absenteeism surveillance via three core methods: (1) simulation of epidemic spread and resulting school absenteeism patterns, (2) surveillance models that generate alerts based on absenteeism data, and (3) evaluation of alert timeliness and accuracy to optimize model parameters. See Vanderkruk et al. (2023) and Ward et al. (2019) for background on the methods, and README for examples.
dlmwwbe v0.1.0: Implements dynamic linear models outlined in Shumway and Stoffer (2025) for wastewater modeling. See the vignette.
< section id="finance" class="level2">Finance
TVMVP v1.0.4: Offers functions to estimate the time-dependent covariance matrix of returns for portfolio optimization, methods for determining the optimal number of factors to be used in the covariance estimation, a hypothesis test of time-varying covariance, functions for portfolio optimization and rolling window evaluation. See Su and Wang (2017), Fan et al. (2024), and Chen et al. (2019) for background. There are two vignettes: Overview and Getting Started Guide and Master’s Thesis.
< section id="genomics" class="level2">Genomics
doblin v0.1.1: Provides functions to quantify dominant clonal lineages from DNA barcoding time-series data along with functions to cluster barcode lineage trajectories and functions to identify persistent clonal lineages across time points. For more details, see Gagné-Leroux et al. (2024). The vignette steps through the Doblin pipeline.
GencoDymo2 v1.0.1: Provides helper functions to facilitate the analysis of genomic annotations from the GENCODE database supporting both human and mouse genomes. This toolkit enables users to extract, filter, and analyze a wide range of annotation features, including genes, transcripts, exons, and introns across different GENCODE releases. See the vignette.
HTGM3D v1.0: Provides tools for working with and visualizing the three gene ontologies based on biological process (BP), molecular function (MF), and cellular component (CC, i.e., subcellular localization) developed by the Gene Ontology (GO) Consortium. See Zeeberg et al. (2003) for background and the vignette for examples.
Machine Learning
cramR v0.1.0: Implements Cram, a general approach to simultaneous learning and evaluation using a generic machine learning algorithm. In a single pass of batched data, Cram uses all of the data to repeatedly train a machine learning algorithm and tests its empirical performance. A cramming process begins by randomly dividing a dataset into batches and defining a baseline rule. Cram trains an ML algorithm using the first batch of data, yielding an updated rule, and then evaluates the performance difference between these two rules using the remaining T − 1 batches. Details of the method are described in Jia et al. (2024) and Jia et al. (2025). There are eight vignettes, including Quick Start and Cram Bandit.
fairmetrics v1.0.3: Provides functions for computing fairness metrics for machine learning and statistical models, including confidence intervals for each metric. The package supports the evaluation of group-level fairness criteria commonly used in fairness research, particularly in healthcare. See Gao et al. (2024) for background and the vignette for an example.
spareg v1.0.0: Implements a framework combining variable screening and random projection techniques for fitting ensembles of predictive generalized linear models to high-dimensional data. See Parzer et al. (2024a) and Parzer et al. (2024b) for details and the vignette for package documentation.
< section id="medicine" class="level2">Medicine
DTEBOP2 v1.0.3: Implements a Bayesian Optimal Phase II design (DTE-BOP2) for trials with delayed treatment effects, particularly relevant to immunotherapy studies where treatment benefits may emerge after a delay. The method incorporates uncertainty in the delay timepoint through a truncated gamma prior and supports two-arm trial designs. See the vignette.
< section id="networks" class="level2">Networks
netcutter v0.3.1: Implements the NetCutter algorithm described in Müller and Mancuso (2008) which identifies co-occurring terms in a list of containers. For example, it may be used to detect genes that co-occur across genomes. See the vignette.
rcoins v0.3.2: Provides functions to group lines that form naturally continuous lines in a spatial network. The algorithm is based on the Continuity in Street Networks (COINS) method from Tripathy et al. (2021), which identifies continuous “strokes” in the network as the line strings that maximize the angles between consecutive segments. See the vignette.
RRmorph v0.0.1: Provides a toolkit designed to investigate the effects of evolutionary rates and morphological convergence on phenotypes. See Melchionna et al. (2024) for details and the vignette for examples.
< section id="programming" class="level2">Programming
interprocess v1.3.0: Uses the boost interprocess library to implement low-level operating system mechanisms for performing atomic operations on shared data structures, including mutexes, semaphores, and message queues. These interprocess communication tools can optionally block with or without a timeout. See README for an example.
< section id="statistics" class="level2">Statistics
cutpoint v1.0.0: Provides functions to estimate cutpoints of a metric or ordinal-scaled variable in the multivariable context of survival data or time-to-event data and visualize the cutpoint estimation process using contour plots, index plots, and spline plots. See Govindarajulu and Tarpey (2022) for the theory and README for examples.
densityratio v0.2.1: Implements multiple non-parametric density ratio techniques, including unconstrained least-squares importance fitting, the Kullback-Leibler importance estimation procedure, spectral density ratio estimation, and more for comparing probability distributions. See Sugiyama et al. (2012) for an overview of density ratio estimation. There are three vignettes, including density ratio and High dimensional two sample testing.
ExtendedLaplace v0.1.6: Provides computational tools for working with the Extended Laplace distribution, including the probability density function, cumulative distribution function, quantile function, random variate generation. See Saah & Kozubowski (2025) for the theory and the vignette for examples.
glmmsel v1/0/2: Provides functions to fit sparse generalized linear mixed models with
sanba v0.0.1: Provides functions to fit Bayesian nested mixture models based on shared atoms as described in Denti et al. (2023) and D’Angelo and Denti (2024). See README for examples.
sfclust v1.0.1: Implements Bayesian clustering of spatial regions with similar functional shapes using spanning trees and latent Gaussian models. The algorithm is based on Zhong et al. (2024). There are three vignettes, including Getting started and Additional features.
survregVB v0.0.1: Implements Bayesian inference in accelerated failure time (AFT) models for right-censored survival times assuming a log-logistic distribution. See Xian et al. (2024) and Xian et al. (2024) for background and the vignette for examples.
< section id="time-series" class="level2">Time Series
gglinedensity v0.2.0: Implements the DenseLine algorithm, which normalizes time series by the arc length to compute accurate densities. See Moritz and Fisher (2018) for background and README for examples.
< section id="topological-data-analysis" class="level2">Topological Data Analysis
phutil v0.0.1: Implements a class for hosting persistence data and provides functions to coerce existing data structures and functions to compute distances between persistence diagrams. See Bubenik et al. (2023) for a formal study of bottleneck and Wasserstein distances and the vignettes: The persistence class and Validation and Benchmark of Wasserstein Distance.
tdarec v0.1.0: Provides functions and tidyverse
recipes for vectorizing Topological Data Analysis persistence diagrams. See Ali et al. (2000) for background and the vignette for examples.
Utilities
bitfield v0.6.1: Provides functions to capture the computational footprint of any model workflow or output by encoding computational decisions into sequences of bits or bitfields that are transformed into integer values. This allows storing information useful for documenting metadata, intermediate that accrue along a workflow, or output metrics. See README.
trackopt v0.1.0: Provides a function track to parameter value, gradient, and Hessian at each iteration of numerical optimizers. Useful for analyzing optimization progress, diagnosing issues, and studying convergence behavior. See README for examples.
< section id="visualization" class="level2">Visualization
gghexsize v0.1.0: Extends ggplot2
to create heatmaps with the size
aesthetic to vary hexagon size.
ggpedigree v0.7.0: Provides plotting functions for visualizing pedigrees in behavior genetics and kinship research. Features include support for duplicated individuals, complex mating structures, integration with simulated pedigrees, and layout customization. See the vignettes Plotting pedigrees, Interactive Plotting and Visualizing Relatedness Matrices.
ggplot2
with geoms analogous to geom_col()
and geom_bar()
that allow for treemaps nested within each bar segment. Also provides geometries for subgroup bordering and text annotation. Look here for examples.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.