There are two kinds of people in the world: people who think there are two kinds of people in the world and people who don’t (borrowed from Menand (2018)). Because things are always simpler when we face only binary choice, aren’t they? But consider here the case were multiple ...

[Read more...] The partial dependence plot is a nice tool to analyse the impact of some explanatory variables when using nonlinear models, such as a random forest, or some gradient boosting.The idea (in dimension 2), given a model for . The partial dependence plot for variable is model is function defined as . This ...

[Read more...] Registrations and call for abstracts, for the 3rd Insurance Data Science Conference, organised on-line 16 – 18 June 2021 (PM in Europe, AM in America), are now open. See https://insurancedatascience.org/ for more details…

[Read more...] In statistics, Kolmogorov–Smirnov test is a popular procedure to test, from a sample is drawn from a distribution , or usually , where is some parametric distribution. For instance, we can test (where ) using that test. More specifically, I wanted to discuss today -values. Given let us draw samples of size , ...

[Read more...] Would you like to put your data science skills to the test? Imperial College London, Universite du Quebec à Montreal (UQAM), and actuarial institutes in Singapore, the UK, including the IFoA, and Australia, ASTIN, the Casualty Actuarial Society are co-organising a global data science competition. Would you like to accurately predict ...

[Read more...] For my ACT6100 weekly quiz, I usually generate some datasets, and then ask students to compare various predictive algorithms. Last week, it was about classification trees and random forests. And students were surprised to have such differences (they had to estimate the probability to have a specific label, for the ...

[Read more...]My kids have a very popular blog (at least among their grandmothers) where they frequently post pictures from everyday’s life (since they live 5000km from them), as well as pictures taken from holidays. This afternoon, I tried to used the popupImage function from the leaflet package to post pictures, ... [Read more...]

In September, we are usually happy to see our favorite TV series back on air… Or not? Because admit it, if we are happy to see those characters back, most of the time, we are disappointed. So why not look at the data, to confirm this feeling? Nazareno Andrade shared ...

[Read more...] For almost a month, on a daily basis, we are working with colleagues (Romuald, Chi and Mathieu) on modeling the dynamics of the recent pandemic. I learn of lot of things discussing with them, but we keep struggling with the tests. Paul, in Montréal, helped me a little bit, ...

[Read more...] Let us get back on the Titanic dataset, loc_fichier = "http://freakonometrics.free.fr/titanic.RData" download.file(loc_fichier, "titanic.RData") load("titanic.RData") base = base[!is.na(base$Age),] On consider two variables, the age (the continuous one) and the survivor indicator (the qualitative one) X = base$Age ...

[Read more...] In Statistical Inference in a Stochastic Epidemic SEIR Model with Control Intervention, a more complex model than the one we’ve seen yesterday was considered (and is called the SEIR model). Consider a population of size , and assume that is the number of susceptible, the number of exposed, the number ...

[Read more...] When introducing the SIR model, in our initial post, we got an ordinary differential equation, but we did not really discuss stability, and periodicity. It has to do with the Jacobian matrix of the system. But first of all, we had three equations for three function, but actuallyso it means ...

[Read more...] The most popular model to model epidemics is the so-called SIR model – or Kermack-McKendrick. Consider a population of size , and assume that is the number of susceptible, the number of infectious, and for the number recovered (or immune) individuals, so that which implies that . In order to be more realistic, ...

[Read more...] In the first part of the course on linear models, we’ve seen how to construct a linear model when the vector of covariates is given, so that is either simply (for standard linear models) or a functional of (in GLMs). But more generally, we can consider transformations of the ...

[Read more...] A few days ago, I came back on a sentence I found (in a French newspaper), where someone was claiming that “… an old variable explains 85% of the change in a new variable. So we can talk about causality” and I tried to explain that it was just stupid : if we ...

[Read more...] To compute Lasso regression, define the soft-thresholding functionThe R function would be soft_thresholding = function(x,a){ sign(x) * pmax(abs(x)-a,0) } To solve our optimization problem, set so that the optimization problem can be written, equivalently hence and one gets or, if we develop Again, if there are ...

[Read more...] A few months ago, I posted a note with some home made codes for quantile regression… there was something odd on the output, but it was because there was a (small) mathematical problem in my equation. So since I should teach those tomorrow, let me fix them. Median Consider a ...

[Read more...] Cochran Theorem – from The distribution of quadratic forms in a normal system, with applications to the analysis of covariance published in 1934 – is probably the most import one in a regression course. It is an application of a nice result on quadratic forms of Gaussian vectors. More precisely, we can prove ...

[Read more...] In the MAT7381 course (graduate course on regression models), we will talk about optimization, and a classical tool is the so-called conjugate. Given a function its conjugate is function such that so, long story short, is the maximum gap between the linear function and . Just to visualize, consider a simple ...

[Read more...] Last year, in a post, I discussed how to merge levels of factor variables, using combinatorial techniques (it was for my STT5100 cours, and trees are not in the syllabus), with an extension on trees at the end of the post. consider the following (simulated dataset) n=200 set.seed(1) x1=...

[Read more...]Copyright © 2021 | MH Corporate basic by MH Themes

e-mails with the latest R posts.

(You will not see this message again.)