Blog Archives

Visualising Claims Frequency

July 28, 2015
By
Visualising Claims Frequency

A few years ago, I did publish a post to visualize and empirical claims frequency in a portfolio. I wanted to update the code. Here is a code to get a dataset, sinistre <- read.table("http://freakonometrics.free.fr/sinistreACT2040.txt",header=TRUE,sep=";") sinistres=sinistre contrat <- read.table("http://freakonometrics.free.fr/contractACT2040.txt",header=TRUE,sep=";") T=table(sinistres$nocontrat) T1=as.numeric(names(T)) T2=as.numeric(T) nombre1 = data.frame(nocontrat=T1,nbre=T2) I = contrat$nocontrat%in%T1 T1= contrat$nocontrat nombre2 = data.frame(nocontrat=T1,nbre=0) nombre=rbind(nombre1,nombre2) basenb = merge(contrat,nombre) head(basenb) basesin=merge(sinistres,contrat)...

Read more »

Choosing a Classifier

July 21, 2015
By
Choosing a Classifier

In order to illustrate the problem of chosing a classification model consider some simulated data, > n = 500 > set.seed(1) > X = rnorm(n) > ma = 10-(X+1.5)^2*2 > mb = -10+(X-1.5)^2*2 > M = cbind(ma,mb) > set.seed(1) > Z = sample(1:2,size=n,replace=TRUE) > Y = ma*(Z==1)+mb*(Z==2)+rnorm(n)*5 > df = data.frame(Z=as.factor(Z),X,Y) A first strategy is to split the dataset...

Read more »

An Update on Boosting with Splines

July 2, 2015
By
An Update on Boosting with Splines

In my previous post, An Attempt to Understand Boosting Algorithm(s), I was puzzled by the boosting convergence when I was using some spline functions (more specifically linear by parts and continuous regression functions). I was using > library(splines) > fit=lm(y~bs(x,degree=1,df=3),data=df) The problem with that spline function is that knots seem to be fixed. The iterative boosting algorithm is start with...

Read more »

Variable Selection using Cross-Validation (and Other Techniques)

July 1, 2015
By
Variable Selection using Cross-Validation (and Other Techniques)

A natural technique to select variables in the context of generalized linear models is to use a stepŵise procedure. It is natural, but contreversial, as discussed by Frank Harrell  in a great post, clearly worth reading. Frank mentioned about 10 points against a stepwise procedure. It yields R-squared values that are badly biased to be high. The F and chi-squared tests quoted...

Read more »

An Attempt to Understand Boosting Algorithm(s)

June 26, 2015
By
An Attempt to Understand Boosting Algorithm(s)

Tuesday, at the annual meeting of the French Economic Association, I was having lunch Alfred, and while we were chatting about modeling issues (econometric models against machine learning prediction), he asked me what boosting was. Since I could not be very specific, we’ve been looking at wikipedia page. Boosting is a machine learning ensemble meta-algorithm for reducing bias primarily and also...

Read more »

‘Variable Importance Plot’ and Variable Selection

June 17, 2015
By
‘Variable Importance Plot’ and Variable Selection

Classification trees are nice. They provide an interesting alternative to a logistic regression.  I started to include them in my courses maybe 7 or 8 years ago. The question is nice (how to get an optimal partition), the algorithmic procedure is nice (the trick of splitting according to one variable, and only one, at each node, and then to move forward, never backward),...

Read more »

p-hacking, or cheating on a p-value

June 11, 2015
By
p-hacking, or cheating on a p-value

Yesterday evening, I discovered some interesting slides on False-Positives, p-Hacking, Statistical Power, and Evidential Value, via @UCBITSS ‘s post on Twitter. More precisely, there was this slide on how cheating (because that’s basically what it is) to get a ‘good’ model (by targeting the p-value) As mentioned by @david_colquhoun  one should be careful when reading the slides : some statistician might have a heart attack...

Read more »

Who interacts on Twitter during a conference (#JDSLille)

June 7, 2015
By
Who interacts on Twitter during a conference (#JDSLille)

Disclamer: This is a joint post with Avner Bar-Hen, a.k.a. @a_bh, Benjamin Guedj, a.k.a. @bguedj and Nathalie Villa, a.k.a. @Natty_V2 Organised annually since 1970 by the French Society of Statistics (SFdS), the Journées de Statistique (JdS) are the most important scientific event of the French statistical community. More than 400 researchers, teachers and practitioners meet at each edition. In 2015,...

Read more »

Data Science: from Small to Big Data

May 29, 2015
By
Data Science: from Small to Big Data

This Tuesday, I will be in Leuven (in Belgium) at the ACP meeting to give  a talk on Data Science: from Small to Big Data. The talk will take place in the Faculty Club from 6 till 8 pm. Slides could be found online (with animated pictures). As usual, comments are welcome.

Read more »

Copulas and Financial Time Series

May 12, 2015
By
Copulas and Financial Time Series

I was recently asked to write a survey on copulas for financial time series. The paper is, so far, unfortunately, in French, and is available on https://hal.archives-ouvertes.fr/. There is a description of various models, including some graphs and statistical outputs, obtained from read data. To illustrate, I’ve been using weekly log-returns of (crude) oil prices, Brent, Dubaï and Maya....

Read more »

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)