Blog Archives

“A 99% TVaR is generally a 99.6% VaR”

August 29, 2015
By
“A 99% TVaR is generally a 99.6% VaR”

Almost 6 years ago, I posted a brief comment on a sentence I found surprising, by that time, discovered in a report claiming that the expected shortfall  at the 99 % level corresponds quite closely to the  value-at-risk at a 99.6% level which was inspired by a remark in Swiss Experience report, expected shortfall  on a 99% confidence level […} corresponds to approximately 99.6% to...

Read more »

Pricing Game

August 22, 2015
By

In November, with Romuald Elie and Jérémie Jakubowicz, we will organize a session during the 100% Actuaires day, in Paris, based on a “pricing game“. We provide two datasets, (motor insurance, third party claims), with 2  years of experience, and 100,000 policies. Each ‘team’ has to submit premium proposal for 36,000 potential insured for the third year (third party, material + bodily injury). We will work as a ‘price...

Read more »

Computing AIC on a Validation Sample

July 29, 2015
By
Computing AIC on a Validation Sample

This afternoon, we’ve seen in the training on data science that it was possible to use AIC criteria for model selection. > library(splines) > AIC(glm(dist ~ speed, data=train_cars, family=poisson(link="log"))) 438.6314 > AIC(glm(dist ~ speed, data=train_cars, family=poisson(link="identity"))) 436.3997 > AIC(glm(dist ~ bs(speed), data=train_cars, family=poisson(link="log"))) 425.6434 > AIC(glm(dist ~ bs(speed), data=train_cars, family=poisson(link="identity"))) 428.7195 And I’ve been asked...

Read more »

Modelling Occurence of Events, with some Exposure

July 28, 2015
By
Modelling Occurence of Events, with some Exposure

This afternoon, an interesting point was raised, and I wanted to get back on it (since I did publish a post on that same topic a long time ago). How can we adapt a logistic regression when all the observations do not have the same exposure. Here the model is the following: , the occurence of an event  on the period ...

Read more »

Visualising Claims Frequency

July 28, 2015
By
Visualising Claims Frequency

A few years ago, I did publish a post to visualize and empirical claims frequency in a portfolio. I wanted to update the code. Here is a code to get a dataset, sinistre <- read.table("http://freakonometrics.free.fr/sinistreACT2040.txt",header=TRUE,sep=";") sinistres=sinistre contrat <- read.table("http://freakonometrics.free.fr/contractACT2040.txt",header=TRUE,sep=";") T=table(sinistres$nocontrat) T1=as.numeric(names(T)) T2=as.numeric(T) nombre1 = data.frame(nocontrat=T1,nbre=T2) I = contrat$nocontrat%in%T1 T1= contrat$nocontrat nombre2 = data.frame(nocontrat=T1,nbre=0) nombre=rbind(nombre1,nombre2) basenb = merge(contrat,nombre) head(basenb) basesin=merge(sinistres,contrat)...

Read more »

Choosing a Classifier

July 21, 2015
By
Choosing a Classifier

In order to illustrate the problem of chosing a classification model consider some simulated data, > n = 500 > set.seed(1) > X = rnorm(n) > ma = 10-(X+1.5)^2*2 > mb = -10+(X-1.5)^2*2 > M = cbind(ma,mb) > set.seed(1) > Z = sample(1:2,size=n,replace=TRUE) > Y = ma*(Z==1)+mb*(Z==2)+rnorm(n)*5 > df = data.frame(Z=as.factor(Z),X,Y) A first strategy is to split the dataset...

Read more »

An Update on Boosting with Splines

July 2, 2015
By
An Update on Boosting with Splines

In my previous post, An Attempt to Understand Boosting Algorithm(s), I was puzzled by the boosting convergence when I was using some spline functions (more specifically linear by parts and continuous regression functions). I was using > library(splines) > fit=lm(y~bs(x,degree=1,df=3),data=df) The problem with that spline function is that knots seem to be fixed. The iterative boosting algorithm is start with...

Read more »

Variable Selection using Cross-Validation (and Other Techniques)

July 1, 2015
By
Variable Selection using Cross-Validation (and Other Techniques)

A natural technique to select variables in the context of generalized linear models is to use a stepŵise procedure. It is natural, but contreversial, as discussed by Frank Harrell  in a great post, clearly worth reading. Frank mentioned about 10 points against a stepwise procedure. It yields R-squared values that are badly biased to be high. The F and chi-squared tests quoted...

Read more »

An Attempt to Understand Boosting Algorithm(s)

June 26, 2015
By
An Attempt to Understand Boosting Algorithm(s)

Tuesday, at the annual meeting of the French Economic Association, I was having lunch Alfred, and while we were chatting about modeling issues (econometric models against machine learning prediction), he asked me what boosting was. Since I could not be very specific, we’ve been looking at wikipedia page. Boosting is a machine learning ensemble meta-algorithm for reducing bias primarily and also...

Read more »

‘Variable Importance Plot’ and Variable Selection

June 17, 2015
By
‘Variable Importance Plot’ and Variable Selection

Classification trees are nice. They provide an interesting alternative to a logistic regression.  I started to include them in my courses maybe 7 or 8 years ago. The question is nice (how to get an optimal partition), the algorithmic procedure is nice (the trick of splitting according to one variable, and only one, at each node, and then to move forward, never backward),...

Read more »

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)