# 433 search results for "evaluation"

## How do you know if your model is going to work? Part 3: Out of sample procedures

September 14, 2015
By

Authors: John Mount (more articles) and Nina Zumel (more articles). When fitting and selecting models in a data science project, how do you know that your final model is good? And how sure are you that it’s better than the models that you rejected? In this Part 3 of our four part mini-series “How do … Continue reading...

## C5.0 Class Probability Shrinkage

September 14, 2015
By

(The image above has nothing do to with this post. It does, however, show the prize that my daughter won during a recent vacation to Virginia and how I got it back home). I was recently asked to explain a potential disconnect in C5.0 between the class probabilities shown in the terminal nodes and the values generated...

## From SPSS to R: eoda offers assessment for SPSS users

September 10, 2015
By

For a long time, SPSS has been presumed to be the standard tool for statistical data analysis in companies and public institutions. Now, more users are considering changing their programming language to R – the promising solution in regard to data mining and predictive analytics. R warrants the availability of current data analysis methods because

## Hypothesis-Driven Development Part II

September 8, 2015
By

This post will evaluate signals based on the rank regression hypotheses covered in the last post. The last time around, … Continue reading →

## Hypothesis-Driven Development Part II

September 8, 2015
By

This post will evaluate signals based on the rank regression hypotheses covered in the last post. The last time around, … Continue reading →

## Logistic Regression in R – Part Two

September 2, 2015
By
$Logistic Regression in R – Part Two$

My previous post covered the basics of logistic regression. We must now examine the model to understand how well it fits the data and generalizes to other observations. The evaluation process involves the assessment of three distinct areas – goodness of fit, tests of individual predictors, and validation of predicted values – in order to

## How do you know if your model is going to work? Part 1: The problem

September 2, 2015
By

Authors: John Mount (more articles) and Nina Zumel (more articles). “Essentially, all models are wrong, but some are useful.” George Box Here’s a caricature of a data science project: your company or client needs information (usually to make a decision). Your job is to build a model to predict that information. You fit a model, … Continue reading...

## Predicting Titanic deaths on Kaggle IV: random forest revisited

August 23, 2015
By

On July 19th I used randomForest to predict the deaths on Titanic in the Kaggle competition. Subsequently I found that both bagging and boosting gave better predictions than randomForest. This I found somewhat unsatisfactory, hence I am now revisi...

## Is Bayesian A/B Testing Immune to Peeking? Not Exactly

August 20, 2015
By

Since I joined Stack Exchange as a Data Scientist in June, one of my first projects has been reconsidering the A/B testing system used to evaluate new features and changes to the site. Our current approach relies on computing a p-value to measure our confidence in a new feature. Unfortunately, this leads to a common pitfall in performing A/B...

## Evaluating Logistic Regression Models

August 17, 2015
By

Logistic regression is a technique that is well suited for examining the relationship between a categorical response variable and one or more categorical or continuous predictor variables. The model is generally presented in the following format, where β refers to the parameters and x represents the independent variables. log(odds)=β0+β1∗x1+...+βn∗xn The log(odds), or log-odds ratio, is defined