Articles by arthur charpentier

Could there be incentives to cycle through a red light?

August 13, 2021 | arthur charpentier

This is of course a rhetorical question! Because cyclists must stop when the light is red! … But … there is always that moment, on a bicycle, when you stop, and then you say to yourself the worst part is that the lights are badly regulated, and I know that the next ...

From multinomial regression to binary classification on some Siamese data

March 14, 2021 | arthur charpentier

There are two kinds of people in the world: people who think there are two kinds of people in the world and people who don’t (borrowed from Menand (2018)). Because things are always simpler when we face only binary choice, aren’t they? But consider here the case were multiple ...

Some general thoughts on Partial Dependence Plots with correlated covariates

February 12, 2021 | arthur charpentier

The partial dependence plot is a nice tool to analyse the impact of some explanatory variables when using nonlinear models, such as a random forest, or some gradient boosting.The idea (in dimension 2), given a model for . The partial dependence plot for variable is model is function defined as . This ...

3rd Insurance Data Science Conference

January 25, 2021 | arthur charpentier

Registrations and call for abstracts, for the 3rd Insurance Data Science Conference, organised on-line 16 – 18 June 2021 (PM in Europe, AM in America), are now open. See https://insurancedatascience.org/ for more details…

Lilliefors, Kolmogorov-Smirnov and cross-validation

January 5, 2021 | arthur charpentier

In statistics, Kolmogorov–Smirnov test is a popular procedure to test, from a sample is drawn from a distribution , or usually , where is some parametric distribution. For instance, we can test (where ) using that test. More specifically, I wanted to discuss today -values. Given let us draw samples of size , ...

Insurance Pricing Game

December 18, 2020 | arthur charpentier

Would you like to put your data science skills to the test? Imperial College London, Universite du Quebec à Montreal (UQAM), and actuarial institutes in Singapore, the UK, including the IFoA, and Australia, ASTIN, the Casualty Actuarial Society are co-organising a global data science competition. Would you like to accurately predict ...

Trees and forests

November 30, 2020 | arthur charpentier

For my ACT6100 weekly quiz, I usually generate some datasets, and then ask students to compare various predictive algorithms. Last week, it was about classification trees and random forests. And students were surprised to have such differences (they had to estimate the probability to have a specific label, for the ...

Sharing pictures from holidays in the Canadian Rockies (with R)

August 9, 2020 | arthur charpentier

My kids have a very popular blog (at least among their grandmothers) where they frequently post pictures from everyday’s life (since they live 5000km from them), as well as pictures taken from holidays. This afternoon, I tried to used the popupImage function from the leaflet package to post pictures, ... [Read more...]

Regression discontinuity model for TV series

July 12, 2020 | arthur charpentier

In September, we are usually happy to see our favorite TV series back on air… Or not? Because admit it, if we are happy to see those characters back, most of the time, we are disappointed. So why not look at the data, to confirm this feeling? Nazareno Andrade shared ...

Testing for Covid-19 in the U.S.

April 28, 2020 | arthur charpentier

For almost a month, on a daily basis, we are working with colleagues (Romuald, Chi and Mathieu) on modeling the dynamics of the recent pandemic. I learn of lot of things discussing with them, but we keep struggling with the tests. Paul, in Montréal, helped me a little bit, ...

On the “correlation” between a continuous and a categorical variable

April 4, 2020 | arthur charpentier

Let us get back on the Titanic dataset, loc_fichier = "http://freakonometrics.free.fr/titanic.RData" download.file(loc_fichier, "titanic.RData") load("titanic.RData") base = base[!is.na(base$Age),] On consider two variables, the age (the continuous one) and the survivor indicator (the qualitative one) X = base$Age ...

Modeling Pandemics (3)

March 20, 2020 | arthur charpentier

In Statistical Inference in a Stochastic Epidemic SEIR Model with Control Intervention, a more complex model than the one we’ve seen yesterday was considered (and is called the SEIR model). Consider a population of size , and assume that is the number of susceptible, the number of exposed, the number ...

Modeling pandemics (2)

March 20, 2020 | arthur charpentier

When introducing the SIR model, in our initial post, we got an ordinary differential equation, but we did not really discuss stability, and periodicity. It has to do with the Jacobian matrix of the system. But first of all, we had three equations for three function, but actuallyso it means ...

Modeling pandemics (1)

March 19, 2020 | arthur charpentier

The most popular model to model epidemics is the so-called SIR model – or Kermack-McKendrick. Consider a population of size , and assume that is the number of susceptible, the number of infectious, and for the number recovered (or immune) individuals, so that which implies that . In order to be more realistic, ...

Function basis and regression

March 1, 2020 | arthur charpentier

In the first part of the course on linear models, we’ve seen how to construct a linear model when the vector of covariates is given, so that is either simply (for standard linear models) or a functional of (in GLMs). But more generally, we can consider transformations of the ...

Testing for a causal effect (with 2 time series)

February 19, 2020 | arthur charpentier

A few days ago, I came back on a sentence I found (in a French newspaper), where someone was claiming that “… an old variable explains 85% of the change in a new variable. So we can talk about causality” and I tried to explain that it was just stupid : if we ...

Lasso Regression (home made)

February 17, 2020 | arthur charpentier

To compute Lasso regression, define the soft-thresholding functionThe R function would be soft_thresholding = function(x,a){ sign(x) * pmax(abs(x)-a,0) } To solve our optimization problem, set so that the optimization problem can be written, equivalently hence and one gets or, if we develop Again, if there are ...

Quantile Regression (home made, part 2)

February 17, 2020 | arthur charpentier

A few months ago, I posted a note with some home made codes for quantile regression… there was something odd on the output, but it was because there was a (small) mathematical problem in my equation. So since I should teach those tomorrow, let me fix them. Median Consider a ...

On Cochran Theorem (and Orthogonal Projections)

January 15, 2020 | arthur charpentier

Cochran Theorem – from The distribution of quadratic forms in a normal system, with applications to the analysis of covariance published in 1934 – is probably the most import one in a regression course. It is an application of a nice result on quadratic forms of Gaussian vectors. More precisely, we can prove ...

On the conjugate function

January 13, 2020 | arthur charpentier

In the MAT7381 course (graduate course on regression models), we will talk about optimization, and a classical tool is the so-called conjugate. Given a function its conjugate is function such that so, long story short, is the maximum gap between the linear function and . Just to visualize, consider a simple ...

1 2 3 … 19 »

Copyright © 2022 | MH Corporate basic by MH Themes