## Collinearity and stepwise VIF selection

February 5, 2013
$Collinearity and stepwise VIF selection$

Collinearity, or excessive correlation among explanatory variables, can complicate or prevent the identification of an optimal set of explanatory variables for a statistical model. For example, forward or backward selection of variables could produce inconsistent results, variance partitioning analyses may be unable to identify unique sources of variation, or parameter estimates may include substantial amounts

## Natura non facit saltus

February 5, 2013
$\mathbb{E}_{\mathbb{P}}\left(\sum_{i=1}^N Y_i\right)=\mathbb{E}_{\mathbb{P}}(N) \cdot \mathbb{E}_{\mathbb{P}}(Y_i)$

(see John Wilkins’ article on the – interesting – history of that phrase http://scienceblogs.com/evolvingthoughts/…). We will see, this week in class, several smoothing techniques, for insurance ratemaking. As a starting point, assume that we do not want to use segmentation techniques: everyone will pay exactly the same price. no segmentation of the premium And that price should be related to...

## Proposed techniques for communicating the amount of information contained in a statistical result

February 5, 2013
$Proposed techniques for communicating the amount of information contained in a statistical result$

A couple of weeks ago, I posted about how much we can expect to learn about the state of the world on the basis of a statistical significance test. One way of framing this question is: if we’re trying to come to scientific conclusions on the basis of statistical results, how much can we update

## "I don’t wanna grow up": Age / value relationships for football players

February 1, 2013
Let's get back to the age-value relationship from my last post. I did some more plotting to see on which position this inversed U-shaped relationship is strongest. Please note, that I use a dataframe called eu.players throughout this post, which holds ...

## F1Stats – Visually Comparing Qualifying and Grid Positions with Race Classification

January 30, 2013
Following the roundabout tour of F1Stats – A Prequel to Getting Started With Rank Correlations, here’s a walk through of my attempt to replicate the first part of A Tale of Two

## Modeling Residential Electricity Usage with R

January 30, 2013
Wow, I can’t believe it has been 11 months since my last blog posting!  The next series of postings will be related to the retail energy field.  Residential power usage is satisfying to model as it can be forecast fairly accurately with the right inputs.  Partly as a consequence of deregulation there is now more data more available than...

## Regression on categorical variables

January 30, 2013
$N_{x,t}\sim\mathcal{P}(E_{x,t}\cdot \exp[\alpha_x+\beta_x \kappa_t + \gamma_x \delta_{t-x}])$

This morning, Stéphane asked me tricky question about extracting coefficients from a regression with categorical explanatory variates. More precisely, he asked me if it was possible to store the coefficients in a nice table, with information on the variable and the modality (those two information being in two different columns). Here is some code I did to produce the...

## The "golden age" of a football player

January 28, 2013
It's been some time since my last post on football. And we're talking about european soccer here.So I finally managed to write some functions which allow me to extract player stats from www.transfermarkt.de. The site tracks lots of stats in the world o...

## Evolution of a logistic regression

January 28, 2013
In my last post I showed how one can easily summarize the outcome of a logistic regression. Here I want to show how this really depends on the data-points that are used to estimate the model. Taking a cue from the evolution of a correlation I have plotted the estimated Odds Ratios (ORs) depending on