Blog Archives

Unprincipled Component Analysis

February 10, 2014
By
Unprincipled Component Analysis

As a data scientist I have seen variations of principal component analysis and factor analysis so often blindly misapplied and abused that I have come to think of the technique as unprincipled component analysis. PCA is a good technique often used to reduce sensitivity to overfitting. But this stated design intent leads many to (falsely) Related posts:

Read more »

Bad Bayes: an example of why you need hold-out testing

February 1, 2014
By
Bad Bayes: an example of why you need hold-out testing

We demonstrate a dataset that causes many good machine learning algorithms to horribly overfit. The example is designed to imitate a common situation found in predictive analytic natural language processing. In this type of application you are often building a model using many rare text features. The rare text features are often nearly unique k-grams Related posts:

Read more »

Use standard deviation (not mad about MAD)

January 19, 2014
By
Use standard deviation (not mad about MAD)

Nassim Nicholas Taleb recently wrote an article advocating the abandonment of the use of standard deviation and advocating the use of mean absolute deviation. Mean absolute deviation is indeed an interesting and useful measure- but there is a reason that standard deviation is important even if you do not like it: it prefers models that Related posts:

Read more »

Generalized linear models for predicting rates

January 1, 2014
By
Generalized linear models for predicting rates

I often need to build a predictive model that estimates rates. The example of our age is: ad click through rates (how often a viewer clicks on an ad estimated as a function of the features of the ad and the viewer). Another timely example is estimating default rates of mortgages or credit cards. You Related posts:

Read more »

Sample size and power for rare events

December 3, 2013
By
Sample size and power for rare events

We have written a bit on sample size for common events. We would like to extend this analysis to rare events. In web marketing and a lot of other applications you are trying to estimate a probability of an event (like conversion) where the probability is fairly low (say 5% to 0.5%). In this case Related posts:

Read more »

Practical Data Science with R: Manning Deal of the Day November 19th 2013

November 19, 2013
By
Practical Data Science with R: Manning Deal of the Day November 19th 2013

Please share: Manning Deal of the Day November 19: Half off Practical Data Science with R. Use code dotd1119au at www.manning.com/zumel/. Related posts: Data Science, Machine Learning, and Statistics: what is in a name? Data science project planning S...

Read more »

Practical Data Science with R October 2013 update

October 26, 2013
By
Practical Data Science with R October 2013 update

A quick status update on our upcoming book “Practical Data Science with R” by Nina Zumel and John Mount. We are really happy with how the book is coming out. We were able to cover most everything we hoped to. Part 1 (especially chapter 3) is already being used in courses, and has some very Related posts:

Read more »

Practical Data Science with R, deal of the day Aug 1 2013

July 31, 2013
By
Practical Data Science with R, deal of the day Aug 1 2013

Deal of the Day August 1: Half off my book Practical Data Science with R. Use code dotd0801au at www.manning.com/zumel/ Related posts: Data Science, Machine Learning, and Statistics: what is in a name? Data science project planning Setting expectation...

Read more »

What is “Practical Data Science with R”?

June 22, 2013
By
What is “Practical Data Science with R”?

A bit about our upcoming book “Practical Data Science with R”. Nina and I share our current draft of the front matter from the book, which is a description which will help you decide if this is the book for you (we hope that it is). Or this could be the book that helps explain Related posts:

Read more »

Big News! “Practical Data Science with R” MEAP launched!

May 15, 2013
By
Big News! “Practical Data Science with R” MEAP launched!

Nina Zumel and I ( John Mount ) have been working very hard on producing an exciting new book called “Practical Data Science with R.” The book has now entered Manning Early Access Program (MEAP) which allows you to subscribe to chapters as they become available and give us feedback before the book goes into Related posts:

Read more »