Blog Archives

The Win-Vector R data science value pack

March 11, 2015
By
The Win-Vector R data science value pack

Win-Vector LLC is proud to announce the R data science value pack. 50% off our video course Introduction to Data Science (available at Udemy) and 30% off Practical Data Science with R (from Manning). Pick any combination of video, e-book, and/or print-book you want. Instructions below. Please share and Tweet! For 50% off the video … Continue reading...

Read more »

Announcing: Introduction to Data Science video course

February 25, 2015
By
Announcing: Introduction to Data Science video course

Win-Vector LLC’s Nina Zumel and John Mount are proud to announce their new data science video course Introduction to Data Science is now available on Udemy. We designed the course as an introduction to an advanced topic. The course description is: Use the R Programming Language to execute data science projects and become a data … Continue reading...

Read more »

Check your return types when modeling in R

January 27, 2015
By
Check your return types when modeling in R

Just a warning: double check your return types in R, especially when using different modeling packages. We consider ourselves pretty familiar with R. We have years of experience, many other programming languages to compare R to, and we have taken Hadley Wickham’s Master R Developer Workshop (highly recommended). We already knew R’s predict function is … Continue reading...

Read more »

R bracket is a bit irregular

January 17, 2015
By
R bracket is a bit irregular

While skimming Professor Hadley Wickham’s Advanced R I got to thinking about nature of the square-bracket or extract operator in R. It turns out “” is a bit more irregular than I remembered. The subsetting section of Advanced R has a very good discussion on the subsetting and selection operators found in R. In particular … Continue reading...

Read more »

Is there a Kindle edition of Practical Data Science with R?

December 21, 2014
By
Is there a Kindle edition of Practical Data Science with R?

We have often been asked “why is there no Kindle edition of Practical Data Science with R on Amazon.com?” The short answer is: there is an edition you can read on your Kindle: but it is from the publisher Manning (not Amazon.com). The long answer is: when Amazon.com supplies a Kindle edition readers have to … Continue reading...

Read more »

A comment on preparing data for classifiers

December 4, 2014
By
A comment on preparing data for classifiers

I have been working through (with some honest appreciation) a recent article comparing many classifiers on many data sets: “Do we Need Hundreds of Classifiers to Solve Real World Classification Problems?” Manuel Fernández-Delgado, Eva Cernadas, Senén Barro, Dinani Amorim; 15(Oct):3133−3181, 2014 (which we will call “the DWN paper” in this note). This paper applies 179 Related posts:

Read more »

Excel spreadsheets are hard to get right

November 18, 2014
By
Excel spreadsheets are hard to get right

Any practicing data scientist is going to eventually have to work with a data stored in a Microsoft Excel spreadsheet. A lot of analysts use this format, so if you work with others you are going to run into it. We have already written how we Related posts: Please stop using...

Read more »

Factors are not first-class citizens in R

September 23, 2014
By
Factors are not first-class citizens in R

The primary user-facing data types in the R statistical computing environment behave as vectors. That is: one dimensional arrays of scalar values that have a nice operational algebra. There are additional types (lists, data frames, matrices, environments, and so-on) but the most common data types are vectors. In fact vectors are so common in R Related posts:

Read more »

Reading the Gauss-Markov theorem

August 26, 2014
By
Reading the Gauss-Markov theorem

What is the Gauss-Markov theorem? From “The Cambridge Dictionary of Statistics” B. S. Everitt, 2nd Edition: A theorem that proves that if the error terms in a multiple regression have the same variance and are uncorrelated, then the estimators of the parameters in the model produced by least squares estimation are better (in the sense Related posts:

Read more »

Automatic bias correction doesn’t fix omitted variable bias

July 4, 2014
By
Automatic bias correction doesn’t fix omitted variable bias

Page 94 of Gelman, Carlin, Stern, Dunson, Vehtari, Rubin “Bayesian Data Analysis” 3rd Edition (which we will call BDA3) provides a great example of what happens when common broad frequentist bias criticisms are over-applied to predictions from ordinary linear regression: the predictions appear to fall apart. BDA3 goes on to exhibit what might be considered Related posts:

Read more »