Blog Archives

Wald’s sequential analysis technique

December 10, 2015
By
Wald’s sequential analysis technique

Microsoft Revolution Analytics has just posted our latest article on A/B testing: Wald’s graphical sequential inspection procedure. It is a fun appreciation of a really cool procedure and I hope you check it out. Figure 14, Section 6.4.2, page 111, Abraham Wald, Sequential Analysis, Dover 2004 (reprinting a 1947 edition).

Read more »

Free gradient boosting lecture

November 21, 2015
By

We have always regretted that we didn’t get to cover gradient boosting in Practical Data Science with R (Manning 2014). To try make up for that we are sharing (for free) our GBM lecture from our (paid) video course Introduction to Data Science. (link, all support material here). Please help us get the word out … Continue reading...

Read more »

Fast food, fast publication

November 8, 2015
By
Fast food, fast publication

(This article was first published on Win-Vector Blog » R, and kindly contributed to R-bloggers) The following article is getting quite a lot of press right now: David Just and Brian Wansink (2015). Fast Food, Soft Drink, and Candy Intake is Unrelated to Body Mass Index for 95% of American Adults. Obesity Science & Practice, forthcoming (upcoming in a...

Read more »

Don’t use stats::aggregate()

October 31, 2015
By

When working with an analysis system (such as R) there are usually good reasons to prefer using functions from the “base” system over using functions from extension packages. However, base functions are sometimes locked into unfortunate design compromises that can now be avoided. In R’s case I would say: do not use stats::aggregate(). Read on … Continue reading...

Read more »

Some key Win-Vector serial data science articles

October 7, 2015
By
Some key Win-Vector serial data science articles

As readers have surely noticed the Win-Vector LLC blog isn’t a stream of short notes, but instead a collection of long technical articles. It is the only way we can properly treat topics of consequence. What not everybody may have noticed is a number of these articles are serialized into series for deeper comprehension. The … Continue reading...

Read more »

Using differential privacy to reuse training data

October 5, 2015
By
Using differential privacy to reuse training data

Win-Vector LLC‘s Nina Zumel wrote a great article explaining differential privacy and demonstrating how to use it to enhance forward step-wise logistic regression. This allowed her to reproduce results similar to the recent Science paper “The reusable holdout: Preserving validity in adaptive data analysis”. The technique essentially protects and reuses test data, allowing the series … Continue reading...

Read more »

How do you know if your model is going to work? Part 4: Cross-validation techniques

September 21, 2015
By
How do you know if your model is going to work? Part 4: Cross-validation techniques

Authors: John Mount (more articles) and Nina Zumel (more articles). In this article we conclude our four part series on basic model testing. When fitting and selecting models in a data science project, how do you know that your final model is good? And how sure are you that it’s better than the models that … Continue reading...

Read more »

How do you know if your model is going to work? Part 3: Out of sample procedures

September 14, 2015
By
How do you know if your model is going to work? Part 3: Out of sample procedures

Authors: John Mount (more articles) and Nina Zumel (more articles). When fitting and selecting models in a data science project, how do you know that your final model is good? And how sure are you that it’s better than the models that you rejected? In this Part 3 of our four part mini-series “How do … Continue reading...

Read more »

How do you know if your model is going to work? Part 2: In-training set measures

September 7, 2015
By
How do you know if your model is going to work? Part 2: In-training set measures

Authors: John Mount (more articles) and Nina Zumel (more articles). When fitting and selecting models in a data science project, how do you know that your final model is good? And how sure are you that it’s better than the models that you rejected? In this Part 2 of our four part mini-series “How do … Continue reading...

Read more »

vtreat up on CRAN!

September 6, 2015
By
vtreat up on CRAN!

Nina Zumel and I are proud to announce our R vtreat variable treatment library has just been accepted by CRAN! It will take some time for the vtreat package to progress to various CRAN mirrors, but as of now you can install vtreat with the command: install.packages('vtreat', repos='http://cran.r-project.org/') Instead of needing to use devtools to … Continue reading...

Read more »

Sponsors

Mango solutions





RStudio homepage

Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training



http://www.eoda.de









ODSC

CRC R books series













Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)