Articles by arthur charpentier

I Fought the (distribution) Law (and the Law did not win)

April 27, 2015 | arthur charpentier

A few days ago, I was asked if we should spend a lot of time to choose the distribution we use, in GLMs, for (actuarial) ratemaking. On that topic, I usually claim that the family is not the most important parameter in the regression model. Consider the following dataset __ db ... [Read more...]

Visualising a Classification in High Dimension, part 2

April 9, 2015 | arthur charpentier

A few weeks ago, I published a post on Visualising a Classification in High Dimension, based on the use of a principal component analysis, to get a projection on the first two components. Following that post, I was wondering what could be done in the context of a classification on ... [Read more...]

Classification with Categorical Variables (the fuzzy side)

April 9, 2015 | arthur charpentier

The Gaussian and the (log) Poisson regressions share a very interesting property, i.e. the average predicted value is the empirical mean of our sample. __ mean(predict(lm(dist~speed,data=cars))) [1] 42.98 __ mean(cars$dist) [1] 42.98 One can prove that it is also the prediction for the average individual in our ... [Read more...]

Another Interactive Map for the Cholera Dataset

March 31, 2015 | arthur charpentier

Following my previous post, François (aka @FrancoisKeck) posted a comment mentionning another package I could use to get an interactive map, the rleafmap package. And the heatmap was here easy to include. This time, we do not use openstreetmap. The first part is still the same, to get the ... [Read more...]

Interactive Maps for John Snow’s Cholera Data

March 28, 2015 | arthur charpentier

This week, in Istanbul, for the second training on data science, we’ve been discussing classification and regression models, but also visualisation. Including maps. And we did have a brief introduction to the leaflet package, devtools::install_github("rstudio/leaflet") require(leaflet) To see what can be done with that ... [Read more...]

Spliting a Node in a Tree

March 23, 2015 | arthur charpentier

If we grow a tree with standard functions in R, on the same dataset used to introduce classification tree in some previous post, __ MYOCARDE=read.table( + "http://freakonometrics.free.fr/saporta.csv", + head=TRUE,sep=";") __ library(rpart) __ cart library(rpart.plot) __ library(rattle) __ prp(cart,type=2,extra=1) The first step ... [Read more...]

Regression Models, It’s Not Only About Interpretation

March 22, 2015 | arthur charpentier

Yesterday, I did upload a post where I tried to show that “standard” regression models where not performing bad. At least if you include splines (multivariate splines) to take into accound joint effects, and nonlinearities. So far, I do not discuss the possible high number of features (but with boostrap ... [Read more...]

Forecast, Automatic Routines vs. Experience

March 18, 2015 | arthur charpentier

This morning, in our Time Series course, we’ve been playing with some data I got from google.ca/trends/. Actually, we’ve been playing on some old version, downloaded 18 months ago (discussed in a previous post, in French). __ urls = "http://freakonometrics.free.fr/report-headphones-2015.csv" __ report=read.table( + urls,... [Read more...]

Growing some Trees

March 18, 2015 | arthur charpentier

Consider here the dataset used in a previous post, about visualising a classification (with more than 2 features), __ MYOCARDE=read.table( + "http://freakonometrics.free.fr/saporta.csv", + header=TRUE,sep=";") The default classification tree is __ arbre = rpart(factor(PRONO)~.,data=MYOCARDE) __ rpart.plot(arbre,type=4,extra=6) We can change the options ... [Read more...]

Some More Results on the Theory of Statistical Learning

March 8, 2015 | arthur charpentier

Yesterday, I did mention a popular graph discussed when studying theoretical foundations of statistical learning. But there is usually another one, which is the following, Let us get back to the underlying formulas. On the traning sample, we have some empirical risk, defined as for some loss function . Why is ... [Read more...]

Some Intuition About the Theory of Statistical Learning

March 7, 2015 | arthur charpentier

While I was working on the Theory of Statistical Learning, and the concept of consistency, I found the following popular graph (e.g. from thoses slides, here in French) The curve below is the error on the training sample, as a function of the size of the training sample. Above, ... [Read more...]

Visualising a Classification in High Dimension

March 6, 2015 | arthur charpentier

So far, when discussing classification, we’ve been playing on my toy-dataset (actually, I should no claim it’s mine, it is inspired by the one used in the introduction of Boosting, by Robert Schapire and Yoav Freund). But in ral life, there are more observations, and more explanatory variables.... [Read more...]

Supervised Classification, beyond the logistic

March 5, 2015 | arthur charpentier

In our data-science class, after discussing limitations of the logistic regression, e.g. the fact that the decision boundary line was a straight line, we’ve mentioned possible natural extensions. Let us consider our (now) standard dataset clr1 [Read more...]

Supervised Classification, discriminant analysis

March 3, 2015 | arthur charpentier

Another popular technique for classification (or at least, which used to be popular) is the (linear) discriminant analysis, introduced by Ronald Fisher in 1936. Consider the same dataset as in our previous post __ clr1 x y z df plot(x,y,pch=19,cex=2,col=clr1[z+1]) The main interest of that ... [Read more...]

Supervised Classification, Logistic and Multinomial

March 2, 2015 | arthur charpentier

We will start, in our Data Science course, to discuss classification techniques (in the context of supervised models). Consider the following case, with 10 points, and two classes (red and blue) __ clr1 clr2 x y z df plot(x,y,pch=19,cex=2,col=clr1[z+1]) To get a prediction, i.e. ... [Read more...]

John Snow, and Google Maps

February 27, 2015 | arthur charpentier

In my previous post, I discussed how to use OpenStreetMaps (and standard plotting functions of R) to visualize John Snow’s dataset. But it is also possible to use Google Maps (and ggplot2 types of graphs). library(ggmap) get_london [Read more...]

John Snow, and OpenStreetMap

February 27, 2015 | arthur charpentier

While I was working for a training on data visualization, I wanted to get a nice visual for John Snow’s cholera dataset. This dataset can actually be found in a great package of famous historical datasets. library(HistData) data(Snow.deaths) data(Snow.streets) One can easily visualize the ... [Read more...]

Visualizing Clusters

February 24, 2015 | arthur charpentier

Consider the following dataset, with (only) ten points x=c(.4,.55,.65,.9,.1,.35,.5,.15,.2,.85) y=c(.85,.95,.8,.87,.5,.55,.5,.2,.1,.3) plot(x,y,pch=19,cex=2) We want to get – say – two clusters. Or more specifically, two sets of observations, each of them sharing some similarities. Since the number of observations is rather small, it is actually possible to ... [Read more...]

k-means clustering and Voronoi sets

February 22, 2015 | arthur charpentier

In the context of -means, we want to partition the space of our observations into classes. each observation belongs to the cluster with the nearest mean. Here “nearest” is in the sense of some norm, usually the (Euclidean) norm. Consider the case where we have 2 classes. The means being respectively ... [Read more...]

Inequalities and Quantile Regression

February 6, 2015 | arthur charpentier

In the course on inequality measure, we've seen how to compute various (standard) inequality indices, based on some sample of incomes (that can be binned, in various categories). On Thursday, we discussed the fact that incomes can be related to different variables (e.g. experience), and that comparing income inequalities ... [Read more...]

« 1 … 5 6 7 8 9 … 19 »

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Articles by arthur charpentier

I Fought the (distribution) Law (and the Law did not win)

Visualising a Classification in High Dimension, part 2

Classification with Categorical Variables (the fuzzy side)

Another Interactive Map for the Cholera Dataset

Interactive Maps for John Snow’s Cholera Data

Spliting a Node in a Tree

Regression Models, It’s Not Only About Interpretation

Forecast, Automatic Routines vs. Experience

Growing some Trees

Some More Results on the Theory of Statistical Learning

Some Intuition About the Theory of Statistical Learning

Visualising a Classification in High Dimension

Supervised Classification, beyond the logistic

Supervised Classification, discriminant analysis

Supervised Classification, Logistic and Multinomial

John Snow, and Google Maps

John Snow, and OpenStreetMap

Visualizing Clusters

k-means clustering and Voronoi sets

Inequalities and Quantile Regression

Articles by arthur charpentier

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)