Blog Archives

Developing an R package from scratch with Travis continuous integration

July 20, 2019
By

This short tutorial provdes a quick guide on how to develop an R package from scratch and how use Travis CI for automatic builds on various R versions and automatic test coverage calculation. The resulting package can be found here: CIexamplePkg A very nice general introduction can be found here: rOpenSci Packages: Development, Maintenance, and Peer Review Some material is taken from...

Read more »

Measuring feature importance in k-means clustering and variants thereof

July 9, 2019
By
Measuring feature importance in k-means clustering and variants thereof

We present a novel approach for measuring feature importance in k-means clustering, or variants thereof, to increase the interpretability of clustering results. In supervised machine learning, feature importance is a widely used tool to ensure interpretability of complex models. We adapt this idea to unsupervised learning via partitional clustering. Our approach is model agnostic in that it only requires...

Read more »

Benchmarking missing data strategies for k-means clustering

June 30, 2019
By
Benchmarking missing data strategies for k-means clustering

The goal is to compare a few algorithms for missing imputation when used before k-means clustering is performed. For the latter we use the same algorithm as in ClustImpute to ensure that only the computation time of the imputation is compared. In a nutshell, we’ll se that ClustImpute scales like a random imputation and hence is much faster than...

Read more »

Intoducing ClustImpute: A new approach for k-means clustering with build-in missing data imputation

June 19, 2019
By
Intoducing ClustImpute: A new approach for k-means clustering with build-in missing data imputation

We are happily introducing a new k-means clustering algorithm that includes a powerful multiple missing data imputation at the computational cost of a few extra random imputations (benchmarks following in a separate article). More precisely, the algorithm draws the missing values iteratively based on the current cluster assignment so that correlations are considered on this level (we assume a...

Read more »

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)