Articles by John Mount

Cross-Methods are a Leak/Variance Trade-Off

March 10, 2020 | John Mount

We have a new Win Vector data science article to share: Cross-Methods are a Leak/Variance Trade-Off John Mount (Win Vector LLC), Nina Zumel (Win Vector LLC) March 10, 2020 We work some exciting examples of when cross-methods (cross validation, and also cross-frames) work, and when they do not work. Abstract Cross-methods ... [Read more...]

Nifty Upcoming Enhancements to unpack/to

February 22, 2020 | John Mount

We have some really nifty upcoming enhancements to wrapr unpack/to. One of the new notations is the use of := as an alternate assignment operator for unpack/to. This lets us write code like the following. First let’s attach our package and set up some example data. library(wrapr) # ... [Read more...]

What is New For vtreat 1.5.2?

February 9, 2020 | John Mount

vtreat version 1.5.2 just became available from CRAN. We have a logged a few improvement in the NEWS. The changes are small and incremental, as the package is already in a great stable state for production use. One of the biggest improvements is documentation clean up, and adapting the examples to ... [Read more...]

New improved cdata instructional video

February 8, 2020 | John Mount

We have a new improved version of the “how to design a cdata/data_algebra data transform” up! The original article, the Python example, and the R example have all been updated to use the new video. Please check it out! [Read more...]

New Data Scientist Stickers

February 5, 2020 | John Mount

We have a new data scientist sticker! If you see Nina or John at a conference/MeetUp, please ask us for a sticker!

[Read more...]

wrapr Update: Removing Some Under-Used Functions and Classes

February 4, 2020 | John Mount

For the next version of the R package wrapr we are going to be removing a number of under-used functions/methods and classes. This update will likely happen in March 2020, and is the start of the wrapr 2.* series. Most of the items being removed are different abstractions for helping with ... [Read more...]

R Tip: Check What Repos You are Using

February 2, 2020 | John Mount

In a lot of our R writing we casually say “install from CRAN using install.packages('PKGNAME')” or “update your packages by using update.packages(ask = FALSE, checkBuilt = TRUE) (and answering ‘no’ to all questions about compiling).” We recently became aware that for some users this isn’t complete advice. ... [Read more...]

Data re-Shaping in R and in Python

January 28, 2020 | John Mount

Nina Zumel and I have a two new tutorials on fluid data wrangling/shaping. They are written in a parallel structure, with the R version of the tutorial being almost identical to the Python version of the tutorial. This reflects our opinion on the “which is better for data science ... [Read more...]

wrapr 1.9.6 is now up on CRAN

January 26, 2020 | John Mount

wrapr 1.9.6 is now up on CRAN. We unfortunately usually forget to say this. A big thank you to the staff and volunteers at CRAN. As part of this release Nina Zumel has streamlined the unpack vignette, picking and recommending specific notations for the unpack method. We are looking forward to ...

[Read more...]

Why we wrote wrapr to/unpack

January 22, 2020 | John Mount

One reason we are developing the wrapr to/unpack methods is the following: we wanted to spruce up the R vtreat interface a bit. We had recently back-ported a Python sklearn Pipeline step style interface from the Python vtreat to R (announcement here). But that doesn’t mean we are ... [Read more...]

Using unpack to Manage Your R Environment

January 21, 2020 | John Mount

In our last note we stated that unpack is a good tool for load R RDS files into your working environment. Here is the idea expanded into a worked example. # remotes::install_github("WinVector/wrapr") library(wrapr) a [Read more...]

unpack Your Values in R

January 20, 2020 | John Mount

I would like to introduce an exciting feature in the upcoming 1.9.6 version of the wrapr R package: value unpacking. The unpacking notation is made available if you install wrapr version 1.9.6 from Github: remotes::install_github("WinVector/wrapr") We will likely send this version to CRAN in a couple of weeks. ... [Read more...]

sklearn Pipe Step Interface for vtreat

January 14, 2020 | John Mount

We’ve been experimenting with this for a while, and the next R vtreat package will have a back-port of the Python vtreat package sklearn pipe step interface (in addition to the standard R interface). This means the user can express easily express modeling intent by choosing between coder$fit_... [Read more...]

New vtreat Feature: Nested Model Bias Warning

January 11, 2020 | John Mount

For quite a while we have been teaching estimating variable re-encodings on the exact same data they are later naively using to train a model on, leads to an undesirable nested model bias. The vtreat package (both the R version and Python version) both incorporate a cross-frame method that allows ... [Read more...]

New Year’s Resolution 2020: Work on more R Data Science Projects

January 4, 2020 | John Mount

We had such a positive reception to our last Introduction to Data Science promotion, that we are going to try and make the course available to more people by lowering the base-price to $29.99. We are also creating a 1 month promotional price of $20.99. To get a permanent subscription to the course ...

[Read more...]

Manning Deal of the Day January 3, 2020 : Half off Practical Data Science with R, Second Edition

January 2, 2020 | John Mount

Manning Deal of the Day January 3, 2020 : Half off Practical Data Science with R, Second Edition. Use code dotd010320au at http://bit.ly/39vD1G4 Please share! [Read more...]

New Timings for a Grouped In-Place Aggregation Task

January 2, 2020 | John Mount

I’d like to share some new timings on a grouped in-place aggregation task. A client of mine was seeing some slow performance, so I decided to time a very simple abstraction of one of the steps of their workflow. Roughly, the task was to add in some derived per-group ... [Read more...]

Introduction to Data Science in R, Free for 3 days

December 30, 2019 | John Mount

To celebrate the new year and the recent release of Practical Data Science with R 2nd Edition, we are offering a free coupon for our video course “Introduction to Data Science.” The following URL and code should get you permanent free access to the video course, if used between now ... [Read more...]

What is a Second Edition?

December 24, 2019 | John Mount

What it is a second edition of a book to its authors? In some sense it is the book the authors thought they were writing the first time. With some good fortune a second edition can be much more than that. For our example: Nina and I received a lot ... [Read more...]

Why to try Practical Data Science with R, 2nd Edition

December 22, 2019 | John Mount

I thought we would try to express why somebody interested in using the R language (and package ecosystem) for supervised machine learning, data wrangling, analytics projects, and other data science topics should give Practical Data Science with R, 2nd Edition a try. Nina Zumel and I shared the book with ...

[Read more...]

« 1 2 3 4 5 … 24 »

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Articles by John Mount

Cross-Methods are a Leak/Variance Trade-Off

Nifty Upcoming Enhancements to unpack/to

What is New For vtreat 1.5.2?

New improved cdata instructional video

New Data Scientist Stickers

wrapr Update: Removing Some Under-Used Functions and Classes

R Tip: Check What Repos You are Using

Data re-Shaping in R and in Python

wrapr 1.9.6 is now up on CRAN

Why we wrote wrapr to/unpack

Using unpack to Manage Your R Environment

unpack Your Values in R

sklearn Pipe Step Interface for vtreat

New vtreat Feature: Nested Model Bias Warning

New Year’s Resolution 2020: Work on more R Data Science Projects

Manning Deal of the Day January 3, 2020 : Half off Practical Data Science with R, Second Edition

New Timings for a Grouped In-Place Aggregation Task

Introduction to Data Science in R, Free for 3 days

What is a Second Edition?

Why to try Practical Data Science with R, 2nd Edition

Articles by John Mount

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)