Articles by arthur charpentier

On my way to Manizales (Colombia)

June 16, 2019 | arthur charpentier

Next week, I will be in Manizales, Colombia, for the Third International Congress on Actuarial Science and Quantitative Finance. I will be giving a lecture on Wednesday with Jed Fress and Emilianos Valdez. I will give my course on Algorithms for Predictive Modeling on Thursday morning (after Jed and Emil’...

Pareto Models for Top Incomes

June 3, 2019 | arthur charpentier

With Emmanuel Flachaire, we uploaded on hal a paper on Pareto Models for Top Incomes, Top incomes are often related to Pareto distribution. To date, economists have mostly used Pareto Type I distribution to model the upper tail of income and wealth distribution. It is a parametric distribution, with an ... [Read more...]

Estimates on training vs. validation samples

May 23, 2019 | arthur charpentier

Before moving to cross-validation, it was natural to say “I will burn 50% (say) of my data to train a model, and then use the remaining to fit the model”. For instance, we can use training data for variable selection (e.g. using some stepwise procedure in a logistic regression), and ...

What it the interpretation of the diagonal for a ROC curve

March 25, 2019 | arthur charpentier

Last Friday, we discussed the use of ROC curves to describe the goodness of a classifier. I did say that I will post a brief paragraph on the interpretation of the diagonal. If you look around some say that it describes the “strategy of randomly guessing a class“, that it ...

On the poor performance of classifiers in insurance models

March 13, 2019 | arthur charpentier

Each time we have a case study in my actuarial courses (with real data), students are surprised to have hard time getting a “good” model, and they are always surprised to have a low AUC, when trying to model the probability to claim a loss, to die, to fraud, etc. ...

Random thoughts on econometric models with (pure) random features

February 16, 2019 | arthur charpentier

For my lectures on applied linear models, I wanted to illustrate the fact that the is never a good measure of the goodness of the model, since it’s quite easy to improve it. Consider the following dataset n=100 df=data.frame(matrix(rnorm(n*n),n,n)) names(df)=...

NSERC – Discovery Grants Program, over the past 5 years

February 7, 2019 | arthur charpentier

In a previous post, I discussed how it was possible to scrap the NSERC website to get stats about discovery grants. Since we just got the new 2018 figures, I thought it would be a good opportunity to update my graphs, library(XML) library(stringr) url="http://www.nserc-crsng.gc.ca/...

The “probability to win” is hard to estimate…

November 6, 2018 | arthur charpentier

Real-time computation (or estimation) of the “probability to win” is difficult. We’ve seem that in soccer games, in elections… but actually, as a professor, I see that frequently when I grade my students. Consider a classical multiple choice exam. After each question, imagine that you try to compute the ...

Solving the chinese postman problem

October 19, 2018 | arthur charpentier

Some pre-Halloween post today. It started actually while I was in Barcelona : kids wanted to go back to some store we’ve seen the first day, in the gothic part, and I could not remember where it was. And I said to myself that would be quite long to do ...

Monte Carlo techniques to create counterfactuals

October 11, 2018 | arthur charpentier

In the previous STT5100 course, last week, we’ve seen how to use monte carlo simulations. The idea is that we do observe in statistics a sample , and more generally, in econometrics . But let’s get back to statistics (without covariates) to illustrate. We assume that observations are realizations of ... [Read more...]

October, grant proposal season

October 9, 2018 | arthur charpentier

In 2012, Danielle Herbert, Adrian Barnett, Philip Clarke and Nicholas Graves published an article entitled “on the time spent preparing grant proposals: an observational study of Australian researchers“, whose conclusions had been included in Nature under a more explicit title, “Australia’s grant system wastes time” ! In this study, they included 3700 ...

Combining automatically factor levels in R

October 6, 2018 | arthur charpentier

Each time we face real applications in an applied econometrics course, we have to deal with categorial variables. And the same question arise, from students : how can we combine automatically factor levels ? Is there a simple R function ? I did upload a few blog posts, over the pas years. But ...

Convex Regression Model

July 5, 2018 | arthur charpentier

This morning during the lecture on nonlinear regression, I mentioned (very) briefly the case of convex regression. Since I forgot to mention the codes in R, I will publish them here. Assume that where is some convex function. Then is convex if and only if , , Hidreth (1954) proved that ifthen is ...

Game of Friendship Paradox

June 27, 2018 | arthur charpentier

In the introduction of my course next week, I will (briefly) mention networks, and I wanted to provide some illustration of the Friendship Paradox. On network of thrones (discussed in Beveridge and Shan (2016)), there is a dataset with the network of characters in Game of Thrones. The word “friend” might ...

Parallelizing Linear Regression or Using Multiple Sources

June 21, 2018 | arthur charpentier

My previous post was explaining how mathematically it was possible to parallelize computation to estimate the parameters of a linear regression. More speficially, we have a matrix which is matrix and a -dimensional vector, and we want to compute by spliting the job. Instead of using the observations, we’ve ...

Linear Regression, with Map-Reduce

June 18, 2018 | arthur charpentier

Sometimes, with big data, matrices are too big to handle, and it is possible to use tricks to numerically still do the map. Map-Reduce is one of those. With several cores, it is possible to split the problem, to map on each machine, and then to agregate it back at ... [Read more...]

Quantile Regression (home made)

June 14, 2018 | arthur charpentier

After my series of post on classification algorithms, it’s time to get back to R codes, this time for quantile regression. Yes, I still want to get a better understanding of optimization routines, in R. Before looking at the quantile regression, let us compute the median, or the quantile, ...

Discrete or continuous modeling ?

June 13, 2018 | arthur charpentier

Tuesday, we got our conference “Insurance, Actuarial Science, Data & Models” and Dylan Possamaï gave a very interesting concluding talk. In the introduction, he came back briefly on a nice discussion we usually have in economics on the kind of model we should consider. It was about optimal control. In many ...

Classification from scratch, boosting 11/8

June 8, 2018 | arthur charpentier

Eleventh post of our series on classification from scratch. Today, that should be the last one… unless I forgot something important. So today, we discuss boosting. An econometrician perspective I might start with a non-conventional introduction. But that’s actually how I understood what boosting was about. And I am ...

Classification from scratch, bagging and forests 10/8

June 8, 2018 | arthur charpentier

Tenth post of our series on classification from scratch. Today, we’ll see the heuristics of the algorithm inside bagging techniques. Often, bagging is associated with trees, to generate forests. But actually, it is possible using bagging for any kind of model. Recall that bagging means “boostrap aggregation”. So, consider ...

« 1 2 3 4 5 … 19 »

Copyright © 2025 | MH Corporate basic by MH Themes