Blog Archives

Save 45% on Practical Data Science with R (expires May 21, 2013)

May 16, 2014
By
Save 45% on Practical Data Science with R (expires May 21, 2013)

Please share this generous deal from Manning publications: save 45% on Practical Data Science with R through May 21, 2014. Please tweet, forward and share! Related posts: A bit of the agenda of Practical Data Science with R Data Science, Machine Lea...

Read more »

R has some sharp corners

May 15, 2014
By
R has some sharp corners

R is definitely our first choice go-to analysis system. In our opinion you really shouldn’t use something else until you have an articulated reason (be it a need for larger data scale, different programming language, better data source integration, or something else). The advantages of R are numerous: Single integrated work environment. Powerful unified scripting/programming Related posts:

Read more »

A clear picture of power and significance in A/B tests

May 3, 2014
By
A clear picture of power and significance in A/B tests

A/B tests are one of the simplest reliable experimental designs. Controlled experiments embody the best scientific design for establishing a causal relationship between changes and their influence on user-observable behavior. “Practical guide to controlled experiments on the web: listen to your customers not to the HIPPO” Ron Kohavi, Randal M Henne, and Dan Sommerfield, Proceedings Related posts:

Read more »

A bit of the agenda of Practical Data Science with R

May 1, 2014
By
A bit of the agenda of Practical Data Science with R

The goal of Zumel/Mount: Practical Data Science with R is to teach, through guided practice, the skills of a data scientist. We define a data scientist as the person who organizes client input, data, infrastructure, statistics, mathematics and machine learning to deploy useful predictive models into production. Our plan to teach is to: Order the Related posts:

Read more »

Old tails: a crude power law fit on ebook sales

April 18, 2014
By
Old tails: a crude power law fit on ebook sales

We use R to take a very brief look at the distribution of e-book sales on Amazon.com. Recently Hugh Howey shared some eBook sales data spidered from Amazon.com: The 50k Report. The data is largely a single scrape of statistics about various anonymized books. Howey’s analysis tries to break sales down by declared category and Related posts:

Read more »

You don’t need to understand pointers to program using R

April 1, 2014
By
You don’t need to understand pointers to program using R

R is a statistical analysis package based on writing short scripts or programs (versus being based on GUIs like spreadsheets or directed workflow editors). I say “writing short scripts” because R’s programming language (itself called S) is a bit of an oddity that you really wouldn’t be using except it gives you access to superior Related posts:

Read more »

Some statistics about the book

March 4, 2014
By
Some statistics about the book

The release date for Zumel, Mount “Practical Data Science with R” is getting close. I thought I would share a few statistics about what goes into this kind of book. “Practical Data Science with R” started formal work in October of 2012. We had always felt the Win-Vector blog represented practice and research for such Related posts:

Read more »

One day discount on Practical Data Science with R

February 21, 2014
By
One day discount on Practical Data Science with R

Please forward and share this discount offer for our upcoming book. Manning Deal of the Day February 22: Half off Practical Data Science with R. Use code dotd022214au at www.manning.com/zumel/. Related posts: Data Science, Machine Learning, and Statis...

Read more »

The gap between data mining and predictive models

February 20, 2014
By
The gap between data mining and predictive models

The Facebook data science blog shared some fun data explorations this Valentine’s Day in Carlos Greg Diuk’s “The Formation of Love”. They are rightly receiving positive interest in and positive reviews of their work (for example Robinson Meyer’s Atlantic article). The finding is also a great opportunity to discuss the gap between cool data mining Related posts:

Read more »

Unprincipled Component Analysis

February 10, 2014
By
Unprincipled Component Analysis

As a data scientist I have seen variations of principal component analysis and factor analysis so often blindly misapplied and abused that I have come to think of the technique as unprincipled component analysis. PCA is a good technique often used to reduce sensitivity to overfitting. But this stated design intent leads many to (falsely) Related posts:

Read more »