Articles by John Mount

The R community is awesome (and fast)

August 30, 2016 | John Mount

Recently I whined/whinged or generally complained about a few sharp edges in some powerful R systems. In each case I was treated very politely, listened to, and actually got fixes back in a very short timeframe from volunteers. That is really great and probably one of the many reasons ... [Read more...]

vtreat 0.5.27 released on CRAN

August 19, 2016 | John Mount

Win-Vector LLC, Nina Zumel and I are pleased to announce that ‘vtreat’ version 0.5.27 has been released on CRAN. vtreat is a data.frame processor/conditioner that prepares real-world data for predictive modeling in a statistically sound manner. (from the package documentation) Very roughly vtreat accepts an arbitrary “from the wild” ... [Read more...]

My criticism of R numeric summary

August 18, 2016 | John Mount

My criticism of R‘s numeric summary() method is: it unfaithful to numeric arguments (due to bad default behavior) and frankly it should be considered unreliable. It is likely the way it is for historic and compatibility reasons, but in my opinion it does not currently represent a desirable set ... [Read more...]

The Win-Vector parallel computing in R series

August 16, 2016 | John Mount

With our recent publication of “Can you nest parallel operations in R?” we now have a nice series of “how to speed up statistical computations in R” that moves from application, to larger/cloud application, and then to details. For your convenience here they are in order: A gentle introduction ... [Read more...]

Can you nest parallel operations in R?

August 15, 2016 | John Mount

When we teach parallel programming in R we start with the basic use of parallel (please see here for example). This is, in our opinion, a necessary step before getting into clever notation and wrapping such as doParallel and foreach. Only then do the students have a sufficiently explicit interface ... [Read more...]

The magrittr monad

August 6, 2016 | John Mount

Monads are a formal theory of composition where programmers get to invoke some very abstract mathematics (category theory) to argue the minutia of annotating, scheduling, sequencing operations, and side effects. On the positive side the monad axioms are a guarantee that related ways of writing code are in fact substitutable ... [Read more...]

A budget of classifier evaluation measures

July 21, 2016 | John Mount

Beginning analysts and data scientists often ask: “how does one remember and master the seemingly endless number of classifier metrics?” My concrete advice is: Read Nina Zumel’s excellent series on scoring classifiers. Keep notes. Settle on one or two metrics as you move project to project. We prefer “AUC” ...
[Read more...]

vtreat version 0.5.26 released on CRAN

July 12, 2016 | John Mount

Win-Vector LLC, Nina Zumel and I are pleased to announce that ‘vtreat’ version 0.5.26 has been released on CRAN. ‘vtreat’ is a data.frame processor/conditioner that prepares real-world data for predictive modeling in a statistically sound manner. (from the package documentation) ‘vtreat’ is an R package that incorporates a number ... [Read more...]

y-aware scaling in context

June 22, 2016 | John Mount

Nina Zumel introduced y-aware scaling in her recent article Principal Components Regression, Pt. 2: Y-Aware Methods. I really encourage you to read the article and add the technique to your repertoire. The method combines well with other methods and can drive better predictive modeling results. From feedback I am not sure ... [Read more...]

Free e-book: Exploring Data Science

June 8, 2016 | John Mount

We are pleased to announce a new free e-book from Manning Publications: Exploring Data Science. Exploring Data Science is a collection of five chapters hand picked by John Mount and Nina Zumel, introducing you to various areas in data science and explaining which methodologies work best for each. Exploring Data ...
[Read more...]

Using geom_step

June 3, 2016 | John Mount

geom_step is an interesting geom supplied by the R package ggplot2. It is an appropriate rendering option for financial market data and we will show how and why to use it in this article. Let’s take a simple example of plotting market data. In this case we are ...
[Read more...]

A demonstration of vtreat data preparation

June 1, 2016 | John Mount

This article is a demonstration the use of the R vtreat variable preparation package followed by caret controlled training. In previous writings we have gone to great lengths to document, explain and motivate vtreat. That necessarily gets long and unnecessarily feels complicated. In this example we are going to show ...
[Read more...]

On ranger respect.unordered.factors

May 30, 2016 | John Mount

It is often said that “R it its packages.” One package of interest is ranger a fast parallel C++ implementation of random forest machine learning. Ranger is great package and at first glance appears to remove the “only 63 levels allowed for string/categorical variables” limit found in the Fortran randomForest ...
[Read more...]

Installing WVPlots and “knitting R markdown”

May 20, 2016 | John Mount

Some readers have been having a bit of trouble using devtools to install WVPlots. I thought I would write a note with a few instructions to help. These are things you should not have to do often, and things those of us already running R have stumbled through and forgotten ...
[Read more...]

Coming up: principal components analysis

May 7, 2016 | John Mount

Just a “heads-up.” I’ve been editing a two-part series Nina Zumel is writing on some of the pitfalls of improperly applied principal components analysis/regression and how to avoid them (we are using the plural spelling as used in following Everitt The Cambridge Dictionary of Statistics). The series is ...
[Read more...]

vtreat cross frames

May 5, 2016 | John Mount

vtreat cross frames John Mount, Nina Zumel 2016-05-05 As a follow on to “On Nested Models” we work R examples demonstrating “cross validated training frames” (or “cross frames”) in vtreat. Consider the following data frame. The outcome only depends on the “good” variables, not on the (high degree of ...
[Read more...]

On Nested Models

April 26, 2016 | John Mount

We have been recently working on and presenting on nested modeling issues. These are situations where the output of one trained machine learning model is part of the input of a later model or procedure. I am now of the opinion that correct treatment of nested models is one of ...
[Read more...]

Improved vtreat documentation

April 17, 2016 | John Mount

Nina Zumel has donated some time to greatly improve the vtreat R package documentation (now available as pre-rendered HTML here). vtreat is an R data.frame processor/conditioner package that helps prepare real-world data for predictive modeling in a statistically justifiable manner. Even with modern machine learning techniques (random forests, ...
[Read more...]
1 15 16 17 18 19 22

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)