Articles by John Mount

Cross-Methods are a Leak/Variance Trade-Off

March 10, 2020 | John Mount

We have a new Win Vector data science article to share: Cross-Methods are a Leak/Variance Trade-Off John Mount (Win Vector LLC), Nina Zumel (Win Vector LLC) March 10, 2020 We work some exciting examples of when cross-methods (cross validation, and also cross-frames) work, and when they do not work. Abstract Cross-methods ... [Read more...]

Nifty Upcoming Enhancements to unpack/to

February 22, 2020 | John Mount

We have some really nifty upcoming enhancements to wrapr unpack/to. One of the new notations is the use of := as an alternate assignment operator for unpack/to. This lets us write code like the following. First let’s attach our package and set up some example data. library(wrapr) # ... [Read more...]

What is New For vtreat 1.5.2?

February 9, 2020 | John Mount

vtreat version 1.5.2 just became available from CRAN. We have a logged a few improvement in the NEWS. The changes are small and incremental, as the package is already in a great stable state for production use. One of the biggest improvements is documentation clean up, and adapting the examples to ... [Read more...]

R Tip: Check What Repos You are Using

February 2, 2020 | John Mount

In a lot of our R writing we casually say “install from CRAN using install.packages('PKGNAME')” or “update your packages by using update.packages(ask = FALSE, checkBuilt = TRUE) (and answering ‘no’ to all questions about compiling).” We recently became aware that for some users this isn’t complete advice. ... [Read more...]

Data re-Shaping in R and in Python

January 28, 2020 | John Mount

Nina Zumel and I have a two new tutorials on fluid data wrangling/shaping. They are written in a parallel structure, with the R version of the tutorial being almost identical to the Python version of the tutorial. This reflects our opinion on the “which is better for data science ... [Read more...]

wrapr 1.9.6 is now up on CRAN

January 26, 2020 | John Mount

wrapr 1.9.6 is now up on CRAN. We unfortunately usually forget to say this. A big thank you to the staff and volunteers at CRAN. As part of this release Nina Zumel has streamlined the unpack vignette, picking and recommending specific notations for the unpack method. We are looking forward to ...
[Read more...]

Why we wrote wrapr to/unpack

January 22, 2020 | John Mount

One reason we are developing the wrapr to/unpack methods is the following: we wanted to spruce up the R vtreat interface a bit. We had recently back-ported a Python sklearn Pipeline step style interface from the Python vtreat to R (announcement here). But that doesn’t mean we are ... [Read more...]

unpack Your Values in R

January 20, 2020 | John Mount

I would like to introduce an exciting feature in the upcoming 1.9.6 version of the wrapr R package: value unpacking. The unpacking notation is made available if you install wrapr version 1.9.6 from Github: remotes::install_github("WinVector/wrapr") We will likely send this version to CRAN in a couple of weeks. ... [Read more...]

sklearn Pipe Step Interface for vtreat

January 14, 2020 | John Mount

We’ve been experimenting with this for a while, and the next R vtreat package will have a back-port of the Python vtreat package sklearn pipe step interface (in addition to the standard R interface). This means the user can express easily express modeling intent by choosing between coder$fit_... [Read more...]

New vtreat Feature: Nested Model Bias Warning

January 11, 2020 | John Mount

For quite a while we have been teaching estimating variable re-encodings on the exact same data they are later naively using to train a model on, leads to an undesirable nested model bias. The vtreat package (both the R version and Python version) both incorporate a cross-frame method that allows ... [Read more...]

Introduction to Data Science in R, Free for 3 days

December 30, 2019 | John Mount

To celebrate the new year and the recent release of Practical Data Science with R 2nd Edition, we are offering a free coupon for our video course “Introduction to Data Science.” The following URL and code should get you permanent free access to the video course, if used between now ... [Read more...]

What is a Second Edition?

December 24, 2019 | John Mount

What it is a second edition of a book to its authors? In some sense it is the book the authors thought they were writing the first time. With some good fortune a second edition can be much more than that. For our example: Nina and I received a lot ... [Read more...]

Why to try Practical Data Science with R, 2nd Edition

December 22, 2019 | John Mount

I thought we would try to express why somebody interested in using the R language (and package ecosystem) for supervised machine learning, data wrangling, analytics projects, and other data science topics should give Practical Data Science with R, 2nd Edition a try. Nina Zumel and I shared the book with ...
[Read more...]
1 2 3 4 5 24

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)