How do you know if your model is going to work? Part 1: The Problem

[This article was first published on Revolutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

by John Mount (more articles) and Nina Zumel (more articles) of Win-Vector LLC

“Essentially, all models are wrong, but some are useful.” George Box

Here's a caricature of a data science project: your company or client needs information (usually to make a decision). Your job is to build a model to predict that information. You fit a model, perhaps several, to available data and evaluate them to find the best. Then you cross your fingers that your chosen model doesn't crash and burn in the real world. We've discussed detecting if your data has a signal. Now: how do you know that your model is good? And how sure are you that it's better than the models that you rejected?

Bartolomeu Velho 1568 Geocentric illustration Bartolomeu Velho, 1568 (Bibliothèque Nationale, Paris)

Notice the Sun in the 4th revolution about the earth. A very pretty, but not entirely reliable model.

In this latest “Statistics as it should be” series, we will systematically look at what to worry about and what to check. This is standard material, but presented in a “data science” oriented manner. Meaning we are going to consider scoring system utility in terms of service to a negotiable business goal (one of the many ways data science differs from pure machine learning). To organize the ideas into digestible chunks, we are presenting this article as a four part series. This part (part 1) sets up the specific problem.

Win-Vector blog: HOW DO YOU KNOW IF YOUR MODEL IS GOING TO WORK? PART1: THE PROBLEM

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)