Blog Archives

Regression on variables, or on categories?

September 30, 2013
By

I admit it, the title sounds weird. The problem I want to address this evening is related to the use of the stepwise procedure on a regression model, and to discuss the use of categorical variables (and possible misinterpreations). Consider the following dataset > db = read.table("http://freakonometrics.free.fr/db2.txt",header=TRUE,sep=";") First, let us change the reference in our categorical variable  (just to...

Read more »

ROC curves and classification

September 30, 2013
By
ROC curves and classification

To get back to a question asked after the last course (still on non-life insurance), I will spend some time to discuss ROC curve construction, and interpretation. Consider the dataset we’ve been using last week, > db = read.table("http://freakonometrics.free.fr/db.txt",header=TRUE,sep=";") > attach(db) The first step is to get a model. For instance, a logistic regression, where some factors were merged...

Read more »

Nice tutorials to discover R

September 28, 2013
By

A series of tutorials, in R, by Anthony Damico. As claimed on http://twotorials.com/, “how to do stuff in r. two minutes or less, for those of us who prefer to learn by watching and listening“. So far, 000 what is r? the lingua statistica, s’il vous plaît 001 how to download and install r 002 simple shortcuts for the windows r...

Read more »

Logistic regression and categorical covariates

September 26, 2013
By
Logistic regression and categorical covariates

A short post to get back – for my nonlife insurance course – on the interpretation of the output of a regression when there is a categorical covariate. Consider the following dataset > db = read.table("http://freakonometrics.free.fr/db.txt",header=TRUE,sep=";") > tail(db) Y X1 X2 X3 995 1 4.801836 20.82947 A 996 1 9.867854 24.39920 C 997 1 5.390730 21.25119 D 998 1...

Read more »

Monty Hall (oh no, not again)

September 13, 2013
By
Monty Hall (oh no, not again)

Quite frequently, someone on the internet discovers the Monty Hall paradox, and become so enthusiastic that it becomes urgent to publish an article – or a post – about it. The latest example can be http://www.bbc.co.uk/news/magazine-24045598. I won’t blame them, I did the same a few years ago (see http://freakonometrics.hypotheses.org/776, or http://freakonometrics.hypotheses.org/775, in French). My point today is that the...

Read more »

Non-observable vs. observable heterogeneity factor

September 11, 2013
By
Non-observable vs. observable heterogeneity factor

This morning, in the ACT2040 class (on non-life insurance), we’ve discussed the difference between observable and non-observable heterogeneity in ratemaking (from an economic perspective). To illustrate that point (we will spend more time, later on, discussing observable and non-observable risk factors), we looked at the following simple example. Let  denote the height of a person. Consider the following dataset >...

Read more »

Linear regression from a contingency table

September 7, 2013
By
Linear regression from a contingency table

This morning, Benoit sent me an email, about an exercise he found in an econometric textbook, about linear regression. Consider the following dataset, Here, variable X denotes the income, and Y the expenses. The goal was to fit a linear regression (actually, in the email, it was mentioned that we should try to fit an heteroscedastic model, but let...

Read more »

R, Twitter and URLs

August 26, 2013
By
R, Twitter and URLs

Yesterday evening, I wanted to play with Twitter, and see which websites I was using as references in my tweets, to get a Top 4 list. The first problem I got was because installing twitteR on Ubuntu is not that simple ! You have to install properly RCurl… But you before install the package in R, it is necessary...

Read more »

Residuals from a logistic regression

August 23, 2013
By
Residuals from a logistic regression

I always claim that graphs are important in econometrics and statistics ! Of course, it is usually not that simple. Let me come back to a recent experience. A got an email from Sami yesterday, sending me a graph of residuals, and asking me what could be done with a graph of residuals, obtained from a logistic regression ?...

Read more »

Exposure as a possible explanatory variable

August 13, 2013
By
Exposure as a possible explanatory variable

Iin insurance pricing, the exposure is usually used as an offset variable to model claims frequency. As explained many times on this blog (e.g. here), and in my notes, if we have to identical drivers, but one with an exposure of 6 months, and the other one of one year, it should be natural to assume that, on average,...

Read more »