459 search results for "evaluation"

How do you know if your model is going to work? Part 3: Out of sample procedures

September 14, 2015
By
How do you know if your model is going to work? Part 3: Out of sample procedures

Authors: John Mount (more articles) and Nina Zumel (more articles). When fitting and selecting models in a data science project, how do you know that your final model is good? And how sure are you that it’s better than the models that you rejected? In this Part 3 of our four part mini-series “How do … Continue reading...

Read more »

C5.0 Class Probability Shrinkage

September 14, 2015
By
C5.0 Class Probability Shrinkage

(The image above has nothing do to with this post. It does, however, show the prize that my daughter won during a recent vacation to Virginia and how I got it back home). I was recently asked to explain a potential disconnect in C5.0 between the class probabilities shown in the terminal nodes and the values generated...

Read more »

From SPSS to R: eoda offers assessment for SPSS users

September 10, 2015
By
From SPSS to R: eoda offers assessment for SPSS users

For a long time, SPSS has been presumed to be the standard tool for statistical data analysis in companies and public institutions. Now, more users are considering changing their programming language to R – the promising solution in regard to data mining and predictive analytics. R warrants the availability of current data analysis methods because

Read more »

Hypothesis-Driven Development Part II

September 8, 2015
By
Hypothesis-Driven Development Part II

This post will evaluate signals based on the rank regression hypotheses covered in the last post. The last time around, … Continue reading →

Read more »

Hypothesis-Driven Development Part II

September 8, 2015
By
Hypothesis-Driven Development Part II

This post will evaluate signals based on the rank regression hypotheses covered in the last post. The last time around, … Continue reading →

Read more »

Logistic Regression in R – Part Two

September 2, 2015
By
Logistic Regression in R – Part Two

My previous post covered the basics of logistic regression. We must now examine the model to understand how well it fits the data and generalizes to other observations. The evaluation process involves the assessment of three distinct areas – goodness of fit, tests of individual predictors, and validation of predicted values – in order to

Read more »

How do you know if your model is going to work? Part 1: The problem

September 2, 2015
By
How do you know if your model is going to work? Part 1: The problem

Authors: John Mount (more articles) and Nina Zumel (more articles). “Essentially, all models are wrong, but some are useful.” George Box Here’s a caricature of a data science project: your company or client needs information (usually to make a decision). Your job is to build a model to predict that information. You fit a model, … Continue reading...

Read more »

Predicting Titanic deaths on Kaggle IV: random forest revisited

August 23, 2015
By
Predicting Titanic deaths on Kaggle IV: random forest revisited

On July 19th I used randomForest to predict the deaths on Titanic in the Kaggle competition. Subsequently I found that both bagging and boosting gave better predictions than randomForest. This I found somewhat unsatisfactory, hence I am now revisi...

Read more »

Is Bayesian A/B Testing Immune to Peeking? Not Exactly

August 20, 2015
By
Is Bayesian A/B Testing Immune to Peeking? Not Exactly

Since I joined Stack Exchange as a Data Scientist in June, one of my first projects has been reconsidering the A/B testing system used to evaluate new features and changes to the site. Our current approach relies on computing a p-value to measure our confidence in a new feature. Unfortunately, this leads to a common pitfall in performing A/B...

Read more »

Evaluating Logistic Regression Models

August 17, 2015
By
Evaluating Logistic Regression Models

Logistic regression is a technique that is well suited for examining the relationship between a categorical response variable and one or more categorical or continuous predictor variables. The model is generally presented in the following format, where β refers to the parameters and x represents the independent variables. log(odds)=β0+β1∗x1+...+βn∗xn The log(odds), or log-odds ratio, is defined

Read more »

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)