Blog Archives

R minitip: don’t use data.matrix when you mean model.matrix

June 10, 2014
By
R minitip: don’t use data.matrix when you mean model.matrix

A quick R mini-tip: don’t use data.matrix when you mean model.matrix. If you do so you may lose (without noticing) a lot of your model’s explanatory power (due to poor encoding). For some modeling tasks you end up having to prepare a special expanded data matrix before calling a given machine learning algorithm. For example Related posts:

Read more »

R style tip: prefer functions that return data frames

June 6, 2014
By
R style tip: prefer functions that return data frames

While following up on Nina Zumel’s excellent Trimming the Fat from glm() Models in R I got to thinking about code style in R. And I realized: you can make your code much prettier by designing more of your functions to return data.frames. That may seem needlessly heavy-weight, but it has a lot of down-stream Related posts:

Read more »

How does Practical Data Science with R stand out?

June 2, 2014
By
How does Practical Data Science with R stand out?

There are a lot of good books on statistics, machine learning, analytics, and R. So it is valid to ask: how does Practical Data Science with R stand out? Why should a data scientist or an aspiring data scientist buy it? We admit, it isn’t the only book we own. Some relevant books from the Related posts:

Read more »

Save 45% on Practical Data Science with R (expires May 21, 2013)

May 16, 2014
By
Save 45% on Practical Data Science with R (expires May 21, 2013)

Please share this generous deal from Manning publications: save 45% on Practical Data Science with R through May 21, 2014. Please tweet, forward and share! Related posts: A bit of the agenda of Practical Data Science with R Data Science, Machine Lea...

Read more »

R has some sharp corners

May 15, 2014
By
R has some sharp corners

R is definitely our first choice go-to analysis system. In our opinion you really shouldn’t use something else until you have an articulated reason (be it a need for larger data scale, different programming language, better data source integration, or something else). The advantages of R are numerous: Single integrated work environment. Powerful unified scripting/programming Related posts:

Read more »

A clear picture of power and significance in A/B tests

May 3, 2014
By
A clear picture of power and significance in A/B tests

A/B tests are one of the simplest reliable experimental designs. Controlled experiments embody the best scientific design for establishing a causal relationship between changes and their influence on user-observable behavior. “Practical guide to controlled experiments on the web: listen to your customers not to the HIPPO” Ron Kohavi, Randal M Henne, and Dan Sommerfield, Proceedings Related posts:

Read more »

A bit of the agenda of Practical Data Science with R

May 1, 2014
By
A bit of the agenda of Practical Data Science with R

The goal of Zumel/Mount: Practical Data Science with R is to teach, through guided practice, the skills of a data scientist. We define a data scientist as the person who organizes client input, data, infrastructure, statistics, mathematics and machine learning to deploy useful predictive models into production. Our plan to teach is to: Order the Related posts:

Read more »

Old tails: a crude power law fit on ebook sales

April 18, 2014
By
Old tails: a crude power law fit on ebook sales

We use R to take a very brief look at the distribution of e-book sales on Amazon.com. Recently Hugh Howey shared some eBook sales data spidered from Amazon.com: The 50k Report. The data is largely a single scrape of statistics about various anonymized books. Howey’s analysis tries to break sales down by declared category and Related posts:

Read more »

You don’t need to understand pointers to program using R

April 1, 2014
By
You don’t need to understand pointers to program using R

R is a statistical analysis package based on writing short scripts or programs (versus being based on GUIs like spreadsheets or directed workflow editors). I say “writing short scripts” because R’s programming language (itself called S) is a bit of an oddity that you really wouldn’t be using except it gives you access to superior Related posts:

Read more »

Some statistics about the book

March 4, 2014
By
Some statistics about the book

The release date for Zumel, Mount “Practical Data Science with R” is getting close. I thought I would share a few statistics about what goes into this kind of book. “Practical Data Science with R” started formal work in October of 2012. We had always felt the Win-Vector blog represented practice and research for such Related posts:

Read more »