Articles by John Mount

Working in CRAN’s World

February 28, 2022 | John Mount

Part of the deal of having a package up on CRAN is: at any time one may be sent an automated email like the following. Dear maintainer, Please see the problems shown on URL. Please correct before TODAY+14DAYS to safely retain your package on CRAN. The CRAN Team If ... [Read more...]

How to Read Sourav Chatterjee’s Basic XICOR Definition

December 26, 2021 | John Mount

Introduction Professor Sourav Chatterjee recently published a new coefficient of correlation called XICOR (refs: JASA, R package, Arxiv, Hacker News, and a Python package (different author)). The basic formula (in the tie-free case) is: Take X and Y as n-vectors of observations of random variable. Compute the ranks r(i) ... [Read more...]

Don’t Feel Guilty About Selecting Variables

May 30, 2020 | John Mount

We have an exciting new article to share: Don’t Feel Guilty About Selecting Variables. If you are at all interested in the probabilistic justification of important data science techniques, such as variable selection or pruning, this should be an informative and fun read. “Data Science” is often criticized with ... [Read more...]

R Tip: How To Look Up Matrix Values Quickly

March 30, 2020 | John Mount

R is a powerful data science language because, like Matlab, numpy, and Pandas, it exposes vectorized operations. That is, a user can perform operations on hundreds (or even billions) of cells by merely specifying the operation on the column or vector of values. Of course, sometimes it takes a while ... [Read more...]

Re-Share: vtreat Data Preparation Documentation and Video

March 26, 2020 | John Mount

I would like to re-share vtreat (R version, Python version) a data preparation documentation for machine learning tasks. vtreat is a system for preparing messy real world data for predictive modeling tasks (classification, regression, and so on). In particular it is very good at re-coding high-cardinality string-valued (or categorical) variables ... [Read more...]

Keep Calm and Use vtreat (in R and in Python)

March 12, 2020 | John Mount

A big thank you to Dmytro Perepolkin for sharing a “Keep Calm and Use vtreat” poster! Also, we have translated the Python vtreat steps from our recent “Cross-Methods are a Leak/Variance Trade-Off” article into R vtreat steps here. This R-port demonstrates the new to R fit/prepare notation! We ...
[Read more...]

Cross-Methods are a Leak/Variance Trade-Off

March 10, 2020 | John Mount

We have a new Win Vector data science article to share: Cross-Methods are a Leak/Variance Trade-Off John Mount (Win Vector LLC), Nina Zumel (Win Vector LLC) March 10, 2020 We work some exciting examples of when cross-methods (cross validation, and also cross-frames) work, and when they do not work. Abstract Cross-methods ... [Read more...]

Nifty Upcoming Enhancements to unpack/to

February 22, 2020 | John Mount

We have some really nifty upcoming enhancements to wrapr unpack/to. One of the new notations is the use of := as an alternate assignment operator for unpack/to. This lets us write code like the following. First let’s attach our package and set up some example data. library(wrapr) # ... [Read more...]

What is New For vtreat 1.5.2?

February 9, 2020 | John Mount

vtreat version 1.5.2 just became available from CRAN. We have a logged a few improvement in the NEWS. The changes are small and incremental, as the package is already in a great stable state for production use. One of the biggest improvements is documentation clean up, and adapting the examples to ... [Read more...]
1 2 3 22

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)