On leverage

October 3, 2019 | arthur charpentier

Last week, in our STT5100 (applied linear models) class, I’ve introduce the hat matrix, and the notion of leverage. In a classical regression model, (in a matrix form), the ordinary least square estimator of parameter is The prediction can then be writtenwhere is called the hat matrix. The matrix ...
Insurance data science : Text

August 14, 2019 | arthur charpentier

At the Summer School of the Swiss Association of Actuaries, in Lausanne, I will start talking about text based data and NLP this Thursday. Slides are available online Ewen Gallic (AMSE) will present a tutorial on tweets. I can upload a few additiona...
Insurance data science : Pictures

August 13, 2019 | arthur charpentier

At the Summer School of the Swiss Association of Actuaries, in Lausanne, following the part of Jean-Philippe Boucher (UQAM) on telematic data, I will start talking about pictures this Wednesday. Slides are available online Ewen Gallic (AMSE) will present a tutorial on satellite pictures, and a simple classification problem, related ...
Optimal transport on large networks

July 4, 2019 | arthur charpentier

With Alfred Galichon and Lucas Vernet, we recently uploaded a paper entitled optimal transport on large networks on arxiv. This article presents a set of tools for the modeling of a spatial allocation problem in a large geographic market and gives examples of applications. In our settings, the market is ...
On my way to Manizales (Colombia)

June 16, 2019 | arthur charpentier

Next week, I will be in Manizales, Colombia, for the Third International Congress on Actuarial Science and Quantitative Finance. I will be giving a lecture on Wednesday with Jed Fress and Emilianos Valdez. I will give my course on Algorithms for Predictive Modeling on Thursday morning (after Jed and Emil’...
Pareto Models for Top Incomes

June 3, 2019 | arthur charpentier

With Emmanuel Flachaire, we uploaded on hal a paper on Pareto Models for Top Incomes, Top incomes are often related to Pareto distribution. To date, economists have mostly used Pareto Type I distribution to model the upper tail of income and wealth distribution. It is a parametric distribution, with an ... [Read more...]

Estimates on training vs. validation samples

May 23, 2019 | arthur charpentier

Before moving to cross-validation, it was natural to say “I will burn 50% (say) of my data to train a model, and then use the remaining to fit the model”. For instance, we can use training data for variable selection (e.g. using some stepwise procedure in a logistic regression), and ...
The “probability to win” is hard to estimate…

November 6, 2018 | arthur charpentier

Real-time computation (or estimation) of the “probability to win” is difficult. We’ve seem that in soccer games, in elections… but actually, as a professor, I see that frequently when I grade my students. Consider a classical multiple choice exam. After each question, imagine that you try to compute the ...
Solving the chinese postman problem

October 19, 2018 | arthur charpentier

Some pre-Halloween post today. It started actually while I was in Barcelona : kids wanted to go back to some store we’ve seen the first day, in the gothic part, and I could not remember where it was. And I said to myself that would be quite long to do ...
Monte Carlo techniques to create counterfactuals

October 11, 2018 | arthur charpentier

In the previous STT5100 course, last week, we’ve seen how to use monte carlo simulations. The idea is that we do observe in statistics a sample , and more generally, in econometrics . But let’s get back to statistics (without covariates) to illustrate. We assume that observations are realizations of ... [Read more...]

October, grant proposal season

October 9, 2018 | arthur charpentier

In 2012, Danielle Herbert, Adrian Barnett, Philip Clarke and Nicholas Graves published an article entitled “on the time spent preparing grant proposals: an observational study of Australian researchers“, whose conclusions had been included in Nature under a more explicit title, “Australia’s grant system wastes time” ! In this study, they included 3700 ...
Combining automatically factor levels in R

October 6, 2018 | arthur charpentier

Each time we face real applications in an applied econometrics course, we have to deal with categorial variables. And the same question arise, from students : how can we combine automatically factor levels ? Is there a simple R function ? I did upload a few blog posts, over the pas years. But ...
Convex Regression Model

July 5, 2018 | arthur charpentier

This morning during the lecture on nonlinear regression, I mentioned (very) briefly the case of convex regression. Since I forgot to mention the codes in R, I will publish them here. Assume that where is some convex function. Then is convex if and only if , , Hidreth (1954) proved that ifthen is ...
