Articles by arthur charpentier

Where People Live

March 3, 2016 | arthur charpentier

There was an interesting map on reddit this morning, with a visualisation of latitude and longituge of where people live, on Earth. So I tried to reproduce it. To compute the density, I used a kernel based approch __ library(maps) __ data("world.cities") __ X=world.cities[,c("lat","pop")] __ liss=...
[Read more...]

Mortality by Weekday and Age

February 27, 2016 | arthur charpentier

A few days ago, I did mention on Twitter a nice graph, with Mortality by Weekday and Age https://t.co/LyzQ7nJABZ very interesting difference, young vs. old pic.twitter.com/EfrX0C1GBS — Arthur Charpentier (@freakonometrics) 27 février 2016 My colleague Jean-Philippe was extremely sceptical, so I tried to ...
[Read more...]

Reverse Engineering with Correlated Features

February 11, 2016 | arthur charpentier

In econometric modeling, I usually have a problem with correlated features. A few weeks ago, I was discussing feature selection when features are correlated. This week, I was wondering about reverse engineering when features might be correlated (not to say very correlated). The way I see reverse engineering is the ...
[Read more...]

Clustering French Cities (based on Temperatures)

February 11, 2016 | arthur charpentier

In order to illustrate hierarchical clustering techniques and k-means, I did borrow François Husson‘s dataset, with monthly average temperature in several French cities. __ temp=read.table( + "http://freakonometrics.free.fr/FR_temp.txt", + header=TRUE,dec=",") We have 15 cities, with monthly observations __ X=temp[,1:12] __ boxplot(X) Since the ...
[Read more...]

Clusters of Texts

February 10, 2016 | arthur charpentier

Another popular application of classification techniques is on texmining (see e.g. an old post on French president speaches). Consider the following example,  inspired by Nobert Ryciak’s post, with 12 wikipedia pages, on various topics, __ library(tm) __ library(stringi) __ library(proxy) __ titles = c("Boosting_(machine_learning)", + "Random_forest", + "K-nearest_neighbors_...
[Read more...]

Clusters of (French) Regions

February 9, 2016 | arthur charpentier

For the data scienec course of tomorrow, I just wanted to post some functions to illustrate cluster analysis. Consider the dataset of the French 2012 elections __ elections2012=read.table( "http://freakonometrics.free.fr/elections_2012_T1.csv",sep=";",dec=",",header=TRUE) __ voix=which(substr(names( + elections2012),1,11)=="X..Voix.Exp") __ elections2012=elections2012[1:96,] __ X=...
[Read more...]

Simple Distributions for Mixtures?

February 3, 2016 | arthur charpentier

The idea of GLMs is that given some covariates,  has a distribution in the exponential family (Gaussian, Poisson, Gamma, etc). But that does not mean that  has a similar distribution… so there is no reason to test for a Gamma model for  before running a Gamma regression, for instance. But ... [Read more...]

Confidence Regions for Parameters in the Simplex

January 18, 2016 | arthur charpentier

Consider here the case where, in some parametric inference problem, parameter  is a point in the Simplex, For instance, consider some regression, on compositional data, __ library(compositions) __ data(DiagnosticProb) __ Y=DiagnosticProb[,"type"]-1 __ X=DiagnosticProb[,c("A","B","C")] __ model = glm(Y~ilr(X),family=binomial) __ b = ilrInv(coef(model)[... [Read more...]

Inter-relationships in a matrix

December 1, 2015 | arthur charpentier

Last week, I wanted to displaying inter-relationships between data in a matrix. My friend Fleur, from AXA, mentioned an interesting possible application, in car accidents. In car against car accidents, it might be interesting to see which parts of the cars were involved. On https://www.data.gouv.fr/fr/, ...
[Read more...]

Profile Likelihood

November 16, 2015 | arthur charpentier

Consider some simulated data __ set.seed(1) __ x=exp(rnorm(100)) Assume that those data are observed i.id. random variables with distribution, with . The natural idea is to consider the maximum likelihood estimator For instance, consider some maximum likelihood estimator, __ library(MASS) __ (F=fitdistr(x,"gamma")) shape rate 1.4214497 0.8619969 (0.1822570) (0.1320717) __ F$estimate[1]+c(... [Read more...]

Variable Importance with Correlated Features

November 6, 2015 | arthur charpentier

Variable importance graphs are great tool to see, in a model, which variables are interesting. Since we usually use it with random forests, it looks like it is works well with (very) large datasets. The problem with large datasets is that a lot of features are ‘correlated’, and in that ... [Read more...]

Applications of Chi-Square Tests

November 3, 2015 | arthur charpentier

This morning, in our mathematical statistical class, we’ve seen the use of the chi-square test. The first one was related to some goodness of fit of a multinomial distribution. Assume that . In order to test  against , use the statistic Under , . For instance, we have the number of weddings, in ... [Read more...]

Tests, Power and Significance

October 14, 2015 | arthur charpentier

In the mathematical statistics course today, we started talking about tests, and decision rules. To illustrate all the concepts introduced today, we considered the case where we have a sample  with . And we want to test   against  In the course, we’ve seen that we could use a test based ... [Read more...]

Visualising a Circular Density

October 7, 2015 | arthur charpentier

This afternoon, Jean-Luc asked me some help about an old post I did publish, minuit, l’heure du crime; and some graphs published a few days after, where I used a different visualisation, in another post. The idea is that the hour can be seen as circular, in the sense ... [Read more...]
1 3 4 5 6 7 19

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)