Blog Archives

Relative error distributions, without the heavy tail theatrics

September 19, 2016
By

Nina Zumel prepared an excellent article on the consequences of working with relative error distributed quantities (such as wealth, income, sales, and many more) called “Living in A Lognormal World.” The article emphasizes that if you are dealing with such quantities you are already seeing effects of relative error distributions (so it isn’t an exotic … Continue...

Read more »

Did she know we were writing a book?

September 3, 2016
By
Did she know we were writing a book?

Writing a book is a sacrifice. It takes a lot of time, represents a lot of missed opportunities, and does not (directly) pay very well. If you do a good job it may pay back in good-will, but producing a serious book is a great challenge. Nina Zumel and I definitely troubled over possibilities for … Continue...

Read more »

Variables can synergize, even in a linear model

September 1, 2016
By

Introduction Suppose we have the task of predicting an outcome y given a number of variables v1,..,vk. We often want to “prune variables” or build models with fewer than all the variables. This can be to speed up modeling, decrease the cost of producing future data, improve robustness, improve explain-ability, even reduce over-fit, and improve … Continue...

Read more »

The R community is awesome (and fast)

August 30, 2016
By

Recently I whined/whinged or generally complained about a few sharp edges in some powerful R systems. In each case I was treated very politely, listened to, and actually got fixes back in a very short timeframe from volunteers. That is really great and probably one of the many reasons R is a great ecosystem. Please … Continue...

Read more »

vtreat 0.5.27 released on CRAN

August 19, 2016
By

Win-Vector LLC, Nina Zumel and I are pleased to announce that ‘vtreat’ version 0.5.27 has been released on CRAN. vtreat is a data.frame processor/conditioner that prepares real-world data for predictive modeling in a statistically sound manner. (from the package documentation) Very roughly vtreat accepts an arbitrary “from the wild” data frame (with different column types, … Continue...

Read more »

My criticism of R numeric summary

August 18, 2016
By
My criticism of R numeric summary

My criticism of R‘s numeric summary() method is: it unfaithful to numeric arguments (due to bad default behavior) and frankly it should be considered unreliable. It is likely the way it is for historic and compatibility reasons, but in my opinion it does not currently represent a desirable set of tradeoffs. summary() likely represents good … Continue...

Read more »

The Win-Vector parallel computing in R series

August 16, 2016
By

With our recent publication of “Can you nest parallel operations in R?” we now have a nice series of “how to speed up statistical computations in R” that moves from application, to larger/cloud application, and then to details. For your convenience here they are in order: A gentle introduction to parallel computing in R Running … Continue...

Read more »

Can you nest parallel operations in R?

August 15, 2016
By

When we teach parallel programming in R we start with the basic use of parallel (please see here for example). This is, in our opinion, a necessary step before getting into clever notation and wrapping such as doParallel and foreach. Only then do the students have a sufficiently explicit interface to frame important questions about … Continue...

Read more »

The magrittr monad

August 6, 2016
By
The magrittr monad

Monads are a formal theory of composition where programmers get to invoke some very abstract mathematics (category theory) to argue the minutia of annotating, scheduling, sequencing operations, and side effects. On the positive side the monad axioms are a guarantee that related ways of writing code are in fact substitutable and equivalent; so you want … Continue...

Read more »

A budget of classifier evaluation measures

July 21, 2016
By
A budget of classifier evaluation measures

Beginning analysts and data scientists often ask: “how does one remember and master the seemingly endless number of classifier metrics?” My concrete advice is: Read Nina Zumel’s excellent series on scoring classifiers. Keep notes. Settle on one or two metrics as you move project to project. We prefer “AUC” early in a project (when you … Continue...

Read more »

Sponsors

Mango solutions



plotly webpage

dominolab webpage



Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training

datasociety

http://www.eoda.de







ODSC

ODSC

CRC R books series





Six Sigma Online Training









Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)