Happy St Patrick’s Day

[This article was first published on Freakonometrics » R-english, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I love Saint Patrick’s Day for, at least, two reasons. The first one is that, on March 17th, you can play out loud The Pogues, the second one is that it’s the only day in the year when I really enjoy getting a Guiness in a pub. And Guiness is important in statistical science (I did mention a couple of hours ago – on this blog –  that beers were important for social reasons in the academic world, but that was for other reasons…)

> theta=seq(0,pi/2,length=101)
> leaf=sin(2*theta)+.25*sin(6*theta)
> for(k in 0:3)
+ polygon(leaf*cos(theta+k*pi/2),leaf*sin(theta+k*pi/2),col="green")

As mentioned in all my statistics and econometrics courses, the history of statistics (I mean here mathematical statistics) is closely related to Guinness.

A long time ago, there was a Guinness Brewing Company of Dublin, which – as its name suggests – was an Irish brewing company. And the boss, who was to inherit the family business, decided to attract young students, trained in chemistry at Cambridge or Oxford.

In 1899, William Sealy Gosset, who had obtained a double degree in math and chemistry, left Oxford to Dublin. And to be quite honest, being graduate in maths meant when he had studied differential equations and astronomy. Basically, mathematics were useless for Guinness, and he got there with his expertise in chemistry. In fact, William turned out to be also a very good administrator, but this has nothing to do with our story.

William had good memories of his studies in math, and he wondered if he could find a problem to look at. He started studies on workmanship, noting that conditions vary so much (temperature, from hops, malt, manufacturing conditions …) that there were only few consistent data. The “law of errors”  (the central limit theorem) can not apply under these conditions.

In short, Bill (now we know each other a little, we’ll call him Bill) took many measurements, and noticed that the Poisson distribution could be an interesting model to work with. To make the story short, Bill managed to use statistical techniques to control the variance of the production, meaning that he was able to lower losses in the production of beer.

A nice application like this one deserved publication in a scientific journal … Well, of course the Poisson distribution has long been known (it was 1904 and a few months before, Von Bortkiewicz found elegant applications of this law, as discussed in a post  a few weeks ago). But there was a disclosure issue there: Bill’s contract prohibited him from disclosing secrets to the competitors.

Meanwhile, Bill had met Karl Pearson, who was then editor of Biometrika, and encouraged him to publish his results. In 1906, Bill who had helped Guiness to gain a lot of money – doing applied mathematics can be usefull – managed to take a sabbatical to work with Pearson to Galton Laboratory biometrics. Bill and Karl decided to publish the work under a pseudonym ”Student.” The legend claims that they had hesitated to use “pupil.”

And for almost 30 years, “Mr Gosset” honorable employee Guinness led a dissolute life by publishing in statistical journals (after work in the brewery) always under the pseudonym “Student”. Of course, it might not be that simple. I mean, Bill had a family life, too. And his wife was the captain of the national Hockey team. So I hardly imagine Bill playing the smart ass and doing mathematical computations, when it was time to wash the dishes or iron his shirt…

In 1908, he wrote a remarkable “the probable error of the mean“ remarked, at least, by Ronald Fisher. In fact, Bill found that there was a interesting law, but – as the normal – it was difficult to manipulate to obtain confidence intervals. Without a computer, he had the idea of ​​using monte carlo methods to tabulate quantiles and construct its tables. And he was probably the first one to look carefully at the problem of small samples, unlike Karl Pearson, who always put focus on the asymptotic case.

In fact, looking at his small sample, he saw the denominator magnitudes very close to those specifically manipulated Karl, in particular a square root of chi-square law. Well, of course, remained the normality assumption, but at least we had some results for finite samples !

For the story, William Gosset suggested to use letter z for its statistics, the ratio between the mean and (empirical) standard deviation. But a few years later, statisticians became accustomed to use this letter for Gaussian distribution (i.e. when the variance is known), and it became the standard to use the letter t. Hence finally the present name of ”Student-t distribution” and in regression outputs, we have the “t-test”.

A legend (told by Harold Hotelling in his memoirs) claims that the Guinness family discovered this double life on the day of the death of William Gosset in 1937 when mathematicians requested financial assistance to print a volume of the works of their employee. But another legend claims that Mr Guinness himself would have suggested his nickname when he had expressed his intention to publish his research… So I guess we’ll never know. But at least, I’ll think about Bill when I’ll get my first Guiness tonight (but I will probably not be able to tell this story anymore when I’ll reach the fourth…)

Arthur Charpentier

Arthur Charpentier, professor in Montréal, in Actuarial Science. Former professor-assistant at ENSAE Paristech, associate professor at Ecole Polytechnique and assistant professor in Economics at Université de Rennes 1.  Graduated from ENSAE, Master in Mathematical Economics (Paris Dauphine), PhD in Mathematics (KU Leuven), and Fellow of the French Institute of Actuaries.

More PostsWebsite

Follow Me:
TwitterLinkedInGoogle Plus

To leave a comment for the author, please follow the link and comment on their blog: Freakonometrics » R-english.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)