In survey research, it makes a difference how the question is asked. "How would you rate the service you received at that restaurant?" is not the same as "Did you have to wait to be seated, to order your meal, to be served your food, or to pay yo...

A short post to get back – for my nonlife insurance course – on the interpretation of the output of a regression when there is a categorical covariate. Consider the following dataset > db = read.table("http://freakonometrics.free.fr/db.txt",header=TRUE,sep=";") > tail(db) Y X1 X2 X3 995 1 4.801836 20.82947 A 996 1 9.867854 24.39920 C 997 1 5.390730 21.25119 D 998 1...

A short tutorial on doing intersections in R GIS. gIntersection{rgeos} will pick the polygons of the first submitted polygon contained within the second poylgon - this is done without cutting the polygon's edges which cross the clip source polygon. For the function that I use to download the example data, url_shp_to_spdf() please see HERE. library(rgeos)library(dismo)URLs...

I provide an introduction to using logistic regression for prediction (binary classification) using the Titanic data competition from www.Kaggle.com as an example. I use models to predict in missing data, estimate a logistic regression model on a trai...

I always claim that graphs are important in econometrics and statistics ! Of course, it is usually not that simple. Let me come back to a recent experience. A got an email from Sami yesterday, sending me a graph of residuals, and asking me what could be done with a graph of residuals, obtained from a logistic regression ?...

Supposons que l'on dispose d'iris de Paris (en population >100khabts) et qu'on veuille pouvoir les classer selon leurs caractéristiques sociodémos : Population taux de chômage Etudiants CSP etc... Une fois, les iris classés, on se demande si l'on peut transporter cette typologie à une autre grande ville (Lyon) par exemple : Il faudrait alors pouvoir utiliser un modèle d'affectation des iris selon leurs caractéristiques respectives...

Corey Yanofsky writes: In your work, you’ve robustificated logistic regression by having the logit function saturate at, e.g., 0.01 and 0.99, instead of 0 and 1. Do you have any thoughts on a sensible setting for the saturation values? My intuition suggests that it has something to do with proportion of outliers expected in the The post Robust...