# 415 search results for "evaluation"

## How do you know if your model is going to work? Part 4: Cross-validation techniques

September 22, 2015
By

by John Mount (more articles) and Nina Zumel (more articles). In this article we conclude our four part series on basic model testing. When fitting and selecting models in a data science project, how do you know that your final model is good? And how sure are you that it's better than the models that you rejected? In this...

## How do you know if your model is going to work? Part 4: Cross-validation techniques

September 21, 2015
By

Authors: John Mount (more articles) and Nina Zumel (more articles). In this article we conclude our four part series on basic model testing. When fitting and selecting models in a data science project, how do you know that your final model is good? And how sure are you that it’s better than the models that … Continue reading...

## How do you know if your model is going to work? Part 3: Out of sample procedures

September 14, 2015
By

Authors: John Mount (more articles) and Nina Zumel (more articles). When fitting and selecting models in a data science project, how do you know that your final model is good? And how sure are you that it’s better than the models that you rejected? In this Part 3 of our four part mini-series “How do … Continue reading...

## C5.0 Class Probability Shrinkage

September 14, 2015
By

(The image above has nothing do to with this post. It does, however, show the prize that my daughter won during a recent vacation to Virginia and how I got it back home). I was recently asked to explain a potential disconnect in C5.0 between the class probabilities shown in the terminal nodes and the values generated...

## From SPSS to R: eoda offers assessment for SPSS users

September 10, 2015
By

For a long time, SPSS has been presumed to be the standard tool for statistical data analysis in companies and public institutions. Now, more users are considering changing their programming language to R – the promising solution in regard to data mining and predictive analytics. R warrants the availability of current data analysis methods because

## Hypothesis-Driven Development Part II

September 8, 2015
By

This post will evaluate signals based on the rank regression hypotheses covered in the last post. The last time around, … Continue reading →

## Logistic Regression in R – Part Two

September 2, 2015
By
$Logistic Regression in R – Part Two$

My previous post covered the basics of logistic regression. We must now examine the model to understand how well it fits the data and generalizes to other observations. The evaluation process involves the assessment of three distinct areas – goodness of fit, tests of individual predictors, and validation of predicted values – in order to

## How do you know if your model is going to work? Part 1: The problem

September 2, 2015
By

Authors: John Mount (more articles) and Nina Zumel (more articles). “Essentially, all models are wrong, but some are useful.” George Box Here’s a caricature of a data science project: your company or client needs information (usually to make a decision). Your job is to build a model to predict that information. You fit a model, … Continue reading...

## Predicting Titanic deaths on Kaggle IV: random forest revisited

August 23, 2015
By

On July 19th I used randomForest to predict the deaths on Titanic in the Kaggle competition. Subsequently I found that both bagging and boosting gave better predictions than randomForest. This I found somewhat unsatisfactory, hence I am now revisi...