# 490 search results for "evaluation"

## Hypothesis-Driven Development Part II

September 8, 2015
By

This post will evaluate signals based on the rank regression hypotheses covered in the last post. The last time around, … Continue reading →

## Logistic Regression in R – Part Two

September 2, 2015
By
$Logistic Regression in R – Part Two$

My previous post covered the basics of logistic regression. We must now examine the model to understand how well it fits the data and generalizes to other observations. The evaluation process involves the assessment of three distinct areas – goodness of fit, tests of individual predictors, and validation of predicted values – in order to

## How do you know if your model is going to work? Part 1: The problem

September 2, 2015
By

Authors: John Mount (more articles) and Nina Zumel (more articles). “Essentially, all models are wrong, but some are useful.” George Box Here’s a caricature of a data science project: your company or client needs information (usually to make a decision). Your job is to build a model to predict that information. You fit a model, … Continue reading...

## Predicting Titanic deaths on Kaggle IV: random forest revisited

August 23, 2015
By

On July 19th I used randomForest to predict the deaths on Titanic in the Kaggle competition. Subsequently I found that both bagging and boosting gave better predictions than randomForest. This I found somewhat unsatisfactory, hence I am now revisi...

## Is Bayesian A/B Testing Immune to Peeking? Not Exactly

August 20, 2015
By

Since I joined Stack Exchange as a Data Scientist in June, one of my first projects has been reconsidering the A/B testing system used to evaluate new features and changes to the site. Our current approach relies on computing a p-value to measure our confidence in a new feature. Unfortunately, this leads to a common pitfall in performing A/B...

## Evaluating Logistic Regression Models

August 17, 2015
By

Logistic regression is a technique that is well suited for examining the relationship between a categorical response variable and one or more categorical or continuous predictor variables. The model is generally presented in the following format, where β refers to the parameters and x represents the independent variables. log(odds)=β0+β1∗x1+...+βn∗xn The log(odds), or log-odds ratio, is defined

## How Do You Know if Your Data Has Signal?

August 10, 2015
By

Image by Liz Sullivan, Creative Commons. Source: Wikimedia An all too common approach to modeling in data science is to throw all possible variables at a modeling procedure and “let the algorithm sort it out.” This is tempting when you are not sure what are the true causes or predictors of the phenomenon you are … Continue reading...

## Predicting Titanic deaths on Kaggle III: Bagging

August 9, 2015
By

This is the third post on prediction the deaths. The first one used randomforest, the second boosting (gbm). The aim of the third post was to use bagging. In contrast to the former posts I abandoned dplyr in this post. It gave some now you see now you ...

## Sensemaking in R: A Plenitude of Models Makes for Good Storytelling

August 3, 2015
By

"Sensemaking is a motivated, continuous effort to understand connections (which can be among people, places, and events) in order to anticipate their trajectories and act effectively."- Gary Klein, Brian Moon & Robert HoffmanMaking Sense of Sensema...

## Hockey Elbow and Other Response Time Injuries

July 29, 2015
By

You've heard of tennis elbow. Well, there's a non-sports, performance injury that I like to call hockey elbow. An example of such an "injury" is shown in Figure 1, which appeared in a recent computer performance analysis presentation. It's a reminder of how easy it is to become complacent when doing performance analysis and possibly end up reaching...