# Blog Archives

## Some heuristics about spline smoothing

October 8, 2013
By
$\mathbb{E}(Y\vert X=x)=h(x)$

Let us continue our discussion on smoothing techniques in regression. Assume that . where is some unkown function, but assumed to be sufficently smooth. For instance, assume that  is continuous, that exists, and is continuous, that  exists and is also continuous, etc. If  is smooth enough, Taylor’s expansion can be used. Hence, for which can also be writen as for...

## Some heuristics about local regression and kernel smoothing

October 8, 2013
By
$\mathbb{E}(Y\vert X=x)=\beta_0+\beta_1 x$

In a standard linear model, we assume that . Alternatives can be considered, when the linear assumption is too strong. Polynomial regression A natural extension might be to assume some polynomial function, Again, in the standard linear model approach (with a conditional normal distribution using the GLM terminology), parameters can be obtained using least squares, where a regression of...

## Regression on variables, or on categories?

September 30, 2013
By

I admit it, the title sounds weird. The problem I want to address this evening is related to the use of the stepwise procedure on a regression model, and to discuss the use of categorical variables (and possible misinterpreations). Consider the following dataset > db = read.table("http://freakonometrics.free.fr/db2.txt",header=TRUE,sep=";") First, let us change the reference in our categorical variable  (just to...

## ROC curves and classification

September 30, 2013
By
$\{0,1\}$

To get back to a question asked after the last course (still on non-life insurance), I will spend some time to discuss ROC curve construction, and interpretation. Consider the dataset we’ve been using last week, > db = read.table("http://freakonometrics.free.fr/db.txt",header=TRUE,sep=";") > attach(db) The first step is to get a model. For instance, a logistic regression, where some factors were merged...

## Nice tutorials to discover R

September 28, 2013
By

A series of tutorials, in R, by Anthony Damico. As claimed on http://twotorials.com/, “how to do stuff in r. two minutes or less, for those of us who prefer to learn by watching and listening“. So far, 000 what is r? the lingua statistica, s’il vous plaît 001 how to download and install r 002 simple shortcuts for the windows r...

## Logistic regression and categorical covariates

September 26, 2013
By
$A$

A short post to get back – for my nonlife insurance course – on the interpretation of the output of a regression when there is a categorical covariate. Consider the following dataset > db = read.table("http://freakonometrics.free.fr/db.txt",header=TRUE,sep=";") > tail(db) Y X1 X2 X3 995 1 4.801836 20.82947 A 996 1 9.867854 24.39920 C 997 1 5.390730 21.25119 D 998 1...

## Monty Hall (oh no, not again)

September 13, 2013
By
$A$

Quite frequently, someone on the internet discovers the Monty Hall paradox, and become so enthusiastic that it becomes urgent to publish an article – or a post – about it. The latest example can be http://www.bbc.co.uk/news/magazine-24045598. I won’t blame them, I did the same a few years ago (see http://freakonometrics.hypotheses.org/776, or http://freakonometrics.hypotheses.org/775, in French). My point today is that the...

## Non-observable vs. observable heterogeneity factor

September 11, 2013
By
$X$

This morning, in the ACT2040 class (on non-life insurance), we’ve discussed the difference between observable and non-observable heterogeneity in ratemaking (from an economic perspective). To illustrate that point (we will spend more time, later on, discussing observable and non-observable risk factors), we looked at the following simple example. Let  denote the height of a person. Consider the following dataset >...

## Linear regression from a contingency table

September 7, 2013
By

This morning, Benoit sent me an email, about an exercise he found in an econometric textbook, about linear regression. Consider the following dataset, Here, variable X denotes the income, and Y the expenses. The goal was to fit a linear regression (actually, in the email, it was mentioned that we should try to fit an heteroscedastic model, but let...

August 26, 2013
By

Yesterday evening, I wanted to play with Twitter, and see which websites I was using as references in my tweets, to get a Top 4 list. The first problem I got was because installing twitteR on Ubuntu is not that simple ! You have to install properly RCurl… But you before install the package in R, it is necessary...