Blog Archives

Some key Win-Vector serial data science articles

October 7, 2015
By
Some key Win-Vector serial data science articles

As readers have surely noticed the Win-Vector LLC blog isn’t a stream of short notes, but instead a collection of long technical articles. It is the only way we can properly treat topics of consequence. What not everybody may have noticed is a number of these articles are serialized into series for deeper comprehension. The … Continue reading...

Read more »

Using differential privacy to reuse training data

October 5, 2015
By
Using differential privacy to reuse training data

Win-Vector LLC‘s Nina Zumel wrote a great article explaining differential privacy and demonstrating how to use it to enhance forward step-wise logistic regression. This allowed her to reproduce results similar to the recent Science paper “The reusable holdout: Preserving validity in adaptive data analysis”. The technique essentially protects and reuses test data, allowing the series … Continue reading...

Read more »

How do you know if your model is going to work? Part 4: Cross-validation techniques

September 21, 2015
By
How do you know if your model is going to work? Part 4: Cross-validation techniques

Authors: John Mount (more articles) and Nina Zumel (more articles). In this article we conclude our four part series on basic model testing. When fitting and selecting models in a data science project, how do you know that your final model is good? And how sure are you that it’s better than the models that … Continue reading...

Read more »

How do you know if your model is going to work? Part 3: Out of sample procedures

September 14, 2015
By
How do you know if your model is going to work? Part 3: Out of sample procedures

Authors: John Mount (more articles) and Nina Zumel (more articles). When fitting and selecting models in a data science project, how do you know that your final model is good? And how sure are you that it’s better than the models that you rejected? In this Part 3 of our four part mini-series “How do … Continue reading...

Read more »

How do you know if your model is going to work? Part 2: In-training set measures

September 7, 2015
By
How do you know if your model is going to work? Part 2: In-training set measures

Authors: John Mount (more articles) and Nina Zumel (more articles). When fitting and selecting models in a data science project, how do you know that your final model is good? And how sure are you that it’s better than the models that you rejected? In this Part 2 of our four part mini-series “How do … Continue reading...

Read more »

vtreat up on CRAN!

September 6, 2015
By
vtreat up on CRAN!

Nina Zumel and I are proud to announce our R vtreat variable treatment library has just been accepted by CRAN! It will take some time for the vtreat package to progress to various CRAN mirrors, but as of now you can install vtreat with the command: install.packages('vtreat', repos='http://cran.r-project.org/') Instead of needing to use devtools to … Continue reading...

Read more »

How do you know if your model is going to work? Part 1: The problem

September 2, 2015
By
How do you know if your model is going to work? Part 1: The problem

Authors: John Mount (more articles) and Nina Zumel (more articles). “Essentially, all models are wrong, but some are useful.” George Box Here’s a caricature of a data science project: your company or client needs information (usually to make a decision). Your job is to build a model to predict that information. You fit a model, … Continue reading...

Read more »

Efficient accumulation in R

July 27, 2015
By
Efficient accumulation in R

R has a number of very good packages for manipulating and aggregating data (plyr, sqldf, ScaleR, data.table, and more), but when it comes to accumulating results the beginning R user is often at sea. The R execution model is a bit exotic so many R users are very uncertain which methods of accumulating results are … Continue reading...

Read more »

A dynamic programming solution to A/B test design

July 6, 2015
By

Our last article on A/B testing described the scope of the realistic circumstances of A/B testing in practice and gave links to different standard solutions. In this article we will be take an idealized specific situation allowing us to show a particularly beautiful solution to one very special type of A/B test. For this article … Continue reading...

Read more »

What is a good Sharpe ratio?

June 27, 2015
By

We have previously written that we like the investment performance summary called the Sharpe ratio (though it does have some limits). What the Sharpe ratio does is: give you a dimensionless score to compare similar investments that may vary both in riskiness and returns without needing to know the investor’s risk tolerance. It does this … Continue reading...

Read more »

Sponsors

Mango solutions



RStudio homepage



Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training



http://www.eoda.de







ODSC

ODSC

CRC R books series











Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)