# Articles by arthur charpentier

June 27, 2018 |

In the introduction of my course next week, I will (briefly) mention networks, and I wanted to provide some illustration of the Friendship Paradox. On network of thrones (discussed in Beveridge and Shan (2016)), there is a dataset with the network of characters in Game of Thrones. The word “friend” might ...

### Parallelizing Linear Regression or Using Multiple Sources

June 21, 2018 |

My previous post was explaining how mathematically it was possible to parallelize computation to estimate the parameters of a linear regression. More speficially, we have a matrix which is matrix and a -dimensional vector, and we want to compute by spliting the job. Instead of using the observations, we’ve ...

### Linear Regression, with Map-Reduce

June 18, 2018 |

Sometimes, with big data, matrices are too big to handle, and it is possible to use tricks to numerically still do the map. Map-Reduce is one of those. With several cores, it is possible to split the problem, to map on each machine, and then to agregate it back at ... [Read more...]

June 14, 2018 |

After my series of post on classification algorithms, it’s time to get back to R codes, this time for quantile regression. Yes, I still want to get a better understanding of optimization routines, in R. Before looking at the quantile regression, let us compute the median, or the quantile, ...

### Discrete or continuous modeling ?

June 13, 2018 |

Tuesday, we got our conference “Insurance, Actuarial Science, Data & Models” and Dylan Possamaï gave a very interesting concluding talk. In the introduction, he came back briefly on a nice discussion we usually have in economics on the kind of model we should consider. It was about optimal control. In many ...

### Classification from scratch, boosting 11/8

June 8, 2018 |

Eleventh post of our series on classification from scratch. Today, that should be the last one… unless I forgot something important. So today, we discuss boosting. An econometrician perspective I might start with a non-conventional introduction. But that’s actually how I understood what boosting was about. And I am ...

### Classification from scratch, bagging and forests 10/8

June 8, 2018 |

Tenth post of our series on classification from scratch. Today, we’ll see the heuristics of the algorithm inside bagging techniques. Often, bagging is associated with trees, to generate forests. But actually, it is possible using bagging for any kind of model. Recall that bagging means “boostrap aggregation”. So, consider ...

### Classification from scratch, linear discrimination 8/8

June 6, 2018 |

Eighth post of our series on classification from scratch. The latest one was on the SVM, and today, I want to get back on very old stuff, with here also a linear separation of the space, using Fisher’s linear discriminent analysis. Bayes (naive) classifier Consider the follwing naive classification ...

### Classification from scratch, SVM 7/8

June 6, 2018 |

Seventh post of our series on classification from scratch. The latest one was on the neural nets, and today, we will discuss SVM, support vector machines. A formal introduction Here takes values in . Our model will be Thus, the space is divided by a (linear) border The distance from point ...

### Classification from scratch, neural nets 6/8

June 5, 2018 |

Sixth post of our series on classification from scratch. The latest one was on the lasso regression, which was still based on a logistic regression model, assuming that the variable of interest has a Bernoulli distribution. From now on, we will discuss technique that did not originate from those probabilistic ...

### Classification from scratch, penalized Lasso logistic 5/8

June 4, 2018 |

Fifth post of our series on classification from scratch, following the previous post on penalization using the norm (so-called Ridge regression), this time, we will discuss penalization based on the norm (the so-called Lasso regression). First of all, one should admit that if the name stands for least absolute shrinkage ...

### Classification from scratch, penalized Ridge logistic 4/8

June 2, 2018 |

Fourth post of our series on classification from scratch, following the previous post which was some sort of detour on kernels. But today, we’ll get back on the logistic model. Formal approach of the problem We’ve seen before that the classical estimation technique used to estimate the parameters ...

### Classification from scratch, logistic with kernels 3/8

May 31, 2018 |

Third post of our series on classification from scratch, following the previous post introducing smoothing techniques, with (b)-splines. Consider here kernel based techniques. Note that here, we do not use the “logistic” model… it is purely non-parametric. kernel based estimated, from scratch I like kernels because they are somehow ...

### Classification from scratch, trees 9/8

May 30, 2018 |

Nineth post of our series on classification from scratch. Today, we’ll see the heuristics of the algorithm inside classification trees. And yes, I promised eight posts in that series, but clearly, that was not sufficient… sorry for the poor prediction. Decision Tree Decision trees are easy to read. So ...

### Classification from scratch, logistic with splines 2/8

May 30, 2018 |

Today, second post of our series on classification from scratch, following the brief introduction on the logistic regression. Piecewise linear splines To illustrate what’s going on, let us start with a “simple” regression (with only one explanatory variable). The underlying idea is natura non facit saltus, for “nature does ...

### Classification from scratch, logistic regression 1/8

May 30, 2018 |

Let us start today our series on classification from scratch… The logistic regression is based on the assumption that given covariates , has a Bernoulli distribution,The goal is to estimate parameter . Recall that the heuristics for the use of that function for the probability is that Maximimum of the (log)...

### Classification from scratch, overview 0/8

May 29, 2018 |

Before my course on « big data and economics » at the university of Barcelona in July, I wanted to upload a series of posts on classification techniques, to get an insight on machine learning tools. According to some common idea, machine learning algorithms are black boxes. I wanted to get back ...

### Some sort of Otto Neurath (isotype picture) map

May 14, 2018 |

Yesterday evening, I was walking in Budapest, and I saw some nice map that was some sort of Otto Neurath style. It was hand-made but I thought it should be possible to do it in R, automatically. A few years ago, Baptiste Coulmont published a nice blog post on the ...

March 25, 2018 |

This Tuesday, I will be giving the second part of the (crash) graduate course on advanced tools for econometrics. It will take place in Rennes, IMAPP room, and I have been told that there will be a visio with Nantes and Angers. Slides for the morning a... [Read more...]

### When “learning Python” becomes “practicing R” (spoiler)

March 8, 2018 |

15 years ago, a student of mine told me that I should start learning Python, that it was really a great language. Students started to learn it, but I kept postponing. A few years ago, I started also Python for Kids, which is really nice actually, with my son. That was ...