Blog Archives

John Snow, and Google Maps

February 27, 2015
By
John Snow, and Google Maps

In my previous post, I discussed how to use OpenStreetMaps (and standard plotting functions of R) to visualize John Snow’s dataset. But it is also possible to use Google Maps (and ggplot2 types of graphs). library(ggmap) get_london <- get_map(c(-.137,51.513), zoom=17) london <- ggmap(get_london) Again, the tricky part comes from the fact that the coordinate representation system, here, is not...

Read more »

John Snow, and OpenStreetMap

February 27, 2015
By
John Snow, and OpenStreetMap

While I was working for a training on data visualization, I wanted to get a nice visual for John Snow’s cholera dataset. This dataset can actually be found in a great package of famous historical datasets. library(HistData) data(Snow.deaths) data(Snow.streets) One can easily visualize the deaths, on a simplified map, with the streets (here simple grey segments, see Vincent Arel-Bundock’s...

Read more »

Visualizing Clusters

February 24, 2015
By
Visualizing Clusters

Consider the following dataset, with (only) ten points x=c(.4,.55,.65,.9,.1,.35,.5,.15,.2,.85) y=c(.85,.95,.8,.87,.5,.55,.5,.2,.1,.3) plot(x,y,pch=19,cex=2) We want to get – say – two clusters. Or more specifically, two sets of observations, each of them sharing some similarities. Since the number of observations is rather small, it is actually possible to get an exhaustive list of all partitions, and to minimize some criteria, such...

Read more »

k-means clustering and Voronoi sets

February 22, 2015
By
k-means clustering and Voronoi sets

In the context of -means, we want to partition the space of our observations into  classes. each observation belongs to the cluster with the nearest mean. Here “nearest” is in the sense of some norm, usually the (Euclidean) norm. Consider the case where we have 2 classes. The means being respectively the 2 black dots. If we partition based...

Read more »

Inequalities and Quantile Regression

February 6, 2015
By
Inequalities and Quantile Regression

In the course on inequality measure, we've seen how to compute various (standard) inequality indices, based on some sample of incomes (that can be binned, in various categories). On Thursday, we discussed the fact that incomes can be related to different variables (e.g. experience), and that comparing income inequalities between coutries can be biased, if they have very different...

Read more »

Modeling Incomes and Inequalities

January 17, 2015
By
Modeling Incomes and Inequalities

Last week, in our Inequality course, we've been looking at data. We started with some simulated data, only a few of them > library("ineq") > load(url("http://freakonometrics.free.fr/income_5.RData")) > (income=sort(income)) 19233 23707 53297 61667 218662 How could we say that there is inequality in this sample? If we look at the wealth owned by the poorest, the poorest person (1...

Read more »

Automatic Detection of the Language of a Tweet

January 5, 2015
By

Two days ago, in my post to extract automatically my own tweets, and to generate some html list, I mentioned that it would be great if there were a function that could be used to distinguish tweets in English, and tweets in French (usually, I tweet in one of those two languages). And one more time, @3wen came to...

Read more »

An automatic code to extract tweets (and to produce the “Somewhere else” review)

January 3, 2015
By

A few weeks ago, I ask in a post the (simple) question "dear reader, who are you?" just to know more about the readers of my blog. I found that extremely interesting (even if - to be honest - I was expecting more answers to start a more serious sociological study of the readers of my blog). And an...

Read more »

Names in the U.S., from James Smith to Jose Rodriguez

December 7, 2014
By
Names in the U.S., from James Smith to Jose Rodriguez

Two weeks ago, @mona published an interesting post on her blog, about a difficult question, What’s The Most Common Name In America? There were stats about first names, in the U.S., and last names, too. Those informations are - somehow - easy to get. But usually, it is more complicated to get the first and the last name together....

Read more »

Subjective Ways of Cutting a Continuous Variables

December 2, 2014
By
Subjective Ways of Cutting a Continuous Variables

You have probably seen @coulmont's maps. If you haven't, you should probably go and spend some time on his blog (but please, come back afterwards, I still my story to tell you). Consider for instance the maps we obtained for a post published in Monkey Cage, a few months ago, The codes were discussed on a blog post (I...

Read more »