Gosset part 2: small sample statistics
Want to share your content on Rbloggers? click here if you have a blog, or here if you don't.
A NICE ONELINER HERE?
This post is an explainer about the small sample experiment and determining
the ideal sample size for inference.
Economic perspectives and business logic
Brewing beer at scale
One of the problems William S. Gosset worked on was determining the quality
of Malt. To brew beer you need 3
ingredients, yeast, hops and a cereal grain. You start with extracting the starch
from the grain into water. You then flavoring the resulting sweet liquid with
hops and fermenting that mixture with yeast.
Gosset worked on all three ingredients, but in this post I will look into
the cereal grain.
Beer, malts and beer technique
Now, beer brewing has been done since the 6th century BCE, and all the steps
have their own specialised names^{1} , which are different in all languages. So I will
be talking about malt, but remember that it is just a source of starch, a
source of sugars (I’m sorry chemists/ biologist, I have to simplify this a bit).
Grain, Barley, transforming into malt, as seen here.
The source of starch in beer is barley,
this grain is first dried, cleaned and then wetted. This starts the biological
process of germination in the grain (it gets ready to start a new barley plant).
In that process some enzymes are made that will break down starch to smaller
sugars. Now the grains are ready to start a new plant and, now we sort of kill
it by forced drying. The end result of half germinated barley is called malt or
malted barley.
The amount of sugar in the mixture determines the maximum ammount of alcohol the
yeast can create in next steps. Higher alcohol content keeps the beer for longer
periods, but makes the taste different.
Malt and sugars
Guiness Malt in Gosset’s time, came from Irish and English Barley stock.
Since the sugar content in malt can vary from batch to batch, brewers need to
check the sugar content. However there are only so many brewers and there is
checking every batch is not scalable. The Guinness brewery was one of the
largest in the world and this would not do.
The sugar content of malt extract was measured by degrees saccharine per barrel
of 168 poinds malt weight. In earlier experiments the brewers determined that
an extract around 133 degrees saccharine gave the desired alcohol level.
Way more saccharine would affect the stability and life of the beer and increase
the alcohol content of the beer, however that would also increase the tax,
Guinness had to pay to British goverment. A lower level of alcohol would make
the beer go bad earlier and crucially, costumers would go to a competiter.
So you better make sure the malt extract is close to 133 degrees saccharine.
In Gosset’s view, 0.5 degrees was a difference or error in malt extract level
which Guinness and its customers could accept.
“It might be maintained,” he said, that malt extract “should be [estimated]
within .5 of the true result with a probability of 10 to 1”
However how can we be certain that a barrel has a sugar content of 133 degrees?
They could take samples, and average those, but how many samples would give us
enough certainty (that is with odds of 10 to 1 that the sample average is
within 0.5 degree of the real value)?
Simulation
Gosset and his co workers went over to simulation. THIS IS WEIRDLY WRITTEN
They had a lot of samples for one barrel, and Gosset simulated what would
happen if they drew and averaged multiple samples. And generalized this to
what would happen if they sampled from the all of the barrels.
And that is what I will show in R code.
So we are talking about sampling
GIVEN A NORMAL BAG OF BARLEY/MALT EXTRACT WE WANT TO BE WITHIN .5 OF REAL VALUE
10 TO 1.
A/(A + B ) 10/(10+1) = 10/11 0.9090909
Simulating drawing from a barrel
library(tidyverse)
First the goal in clearer language.
We want to know how many samples you have to draw to know what the real degree
saccharine level of the barrel is. With a 10 to 1 odds of being within 0.5 degree.
Let’s first transform that odds to probabilities, because I don’t know what
those mean. A 10 to 1 odds means ten successes and 1 failure, so it is really
10 out of 11, 10/11=0.909.
First we create a dataset.
Say we have a large amount of MALT extract
We took many many many many samples from one barrel, and so we know
what the saccharine level of the barrel is.
set.seed(7334) degrees_sacharine = rnorm(3000,mean = 133, sd = 0.6) # this is really just a guess
Then I create some functions to take a sample, a function to determine if that
value is within the range, and finally I combine them.
take_sample_average < function(bag= degrees_sacharine, sample_size){ mean(sample(bag, sample_size, replace= TRUE)) } within_range < function(value){ ifelse(value > 132.5 & value < 133.5, TRUE, FALSE) } is_sample_within_range < function(sample_size){ take_sample_average(bag = degrees_sacharine, sample_size = sample_size) %>% within_range() } # for example what if we take 3 samples? take_sample_average(bag = degrees_sacharine, 3) ## [1] 132.8
For example:
So now we take 2 samples out of the bag, and get (142.2745, 119.4484)/2 = 142.2745.
Is this within the range of 132.5 and 133.5? no.
But how often, on average am I within the real value of the bag?
We simulate taking 2 to 15 samples from the barrel and averaging per sample.
sampling_experiments < tibble( N = seq_len(2500) ) %>% mutate( # there is probably a more concise way to do this sample_02 = purrr::map_lgl(N, ~is_sample_within_range(sample_size = 2)), sample_03 = purrr::map_lgl(N, ~is_sample_within_range(sample_size = 3)), sample_04 = purrr::map_lgl(N, ~is_sample_within_range(sample_size = 4)), sample_05 = purrr::map_lgl(N, ~is_sample_within_range(sample_size = 5)), sample_06 = purrr::map_lgl(N, ~is_sample_within_range(sample_size = 6)), sample_07 = purrr::map_lgl(N, ~is_sample_within_range(sample_size = 7)), sample_08 = purrr::map_lgl(N, ~is_sample_within_range(sample_size = 8)), sample_09 = purrr::map_lgl(N, ~is_sample_within_range(sample_size = 9)), sample_10 = purrr::map_lgl(N, ~is_sample_within_range(sample_size = 10)), sample_15 = purrr::map_lgl(N, ~is_sample_within_range(sample_size = 15)), sample_20 = purrr::map_lgl(N, ~is_sample_within_range(sample_size = 20)), )
So how many times are the samples within the range?
sampling_experiments %>% summarize_all(.funs = ~sum(.)/n()) %>% # this doesn't read that well # So I reshape the data, remove the N column and remove the sample_ part gather(key = "sample_size", value = "prob_in_range", sample_02:sample_20) %>% select(N) %>% mutate(sample_size = str_remove(sample_size,pattern = "sample_")) ## # A tibble: 11 x 2 ## sample_size prob_in_range #### 1 02 0.755 ## 2 03 0.854 ## 3 04 0.901 ## 4 05 0.932 ## 5 06 0.954 ## 6 07 0.966 ## 7 08 0.982 ## 8 09 0.986 ## 9 10 0.991 ## 10 15 0.999 ## 11 20 0.999
Gosset found in his experiments that he needed at least 4 samples for a
estimation with an odds of at least 10 to 1, which is a probability of
approximately 0.909.
In our case for our bag of estimations we would say at least 5 samples to
get these odds or better.
Armed with this knowledge the Guinness brewery knew it could test the malt extract
by taking 4 samples out of every batch to get an approximation of the true sugar
content of a batch that would be correct in 10 out of 11 times.
That meant that the brewery could use this technique to check the bags of malt
extract, in stead of a master brewer sampling, and feeling the bags.
You can scale the number of tests, but not the amount of brewers/ checkers.
The brewers were happy, Gosset was too. But he realized there must be a system
there, a way to determine
References
 Wikipedia page (en) about Gosset
 The Guinness Brewer who revolutionaized Statistics – Dan Kopf
 Student’s Collected Papers – Pearson E. S. 1943s
 Retrospectives: Guinnessometrics: The Economic Foundation of “Student’s” t – Stephen T. Ziliak
 Fascinating read about Malt extract
 wikipedia article about brewing beer
Image credits
 Hand full of sprouted malt, wikipedia commons, photographer Peter Schill
 detail of malt, author Pierrealain dorange
State of the machine
At the moment of creation (when I knitted this document ) this was the state of my machine: click here to expand
sessioninfo::session_info() ## ─ Session info ────────────────────────────────────────────────────────── ## setting value ## version R version 3.6.1 (20190705) ## os macOS Mojave 10.14.6 ## system x86_64, darwin15.6.0 ## ui X11 ## language (EN) ## collate en_US.UTF8 ## ctype en_US.UTF8 ## tz Europe/Amsterdam ## date 20191011 ## ## ─ Packages ────────────────────────────────────────────────────────────── ## package * version date lib source ## assertthat 0.2.1 20190321 [1] CRAN (R 3.6.0) ## backports 1.1.5 20191002 [1] CRAN (R 3.6.0) ## blogdown 0.16 20191001 [1] CRAN (R 3.6.0) ## bookdown 0.14 20191001 [1] CRAN (R 3.6.0) ## broom 0.5.2 20190407 [1] CRAN (R 3.6.0) ## cellranger 1.1.0 20160727 [1] CRAN (R 3.6.0) ## cli 1.1.0 20190319 [1] CRAN (R 3.6.0) ## colorspace 1.41 20190318 [1] CRAN (R 3.6.0) ## crayon 1.3.4 20170916 [1] CRAN (R 3.6.0) ## digest 0.6.21 20190920 [1] CRAN (R 3.6.0) ## dplyr * 0.8.3 20190704 [1] CRAN (R 3.6.0) ## ellipsis 0.3.0 20190920 [1] CRAN (R 3.6.0) ## evaluate 0.14 20190528 [1] CRAN (R 3.6.0) ## fansi 0.4.0 20181005 [1] CRAN (R 3.6.0) ## forcats * 0.4.0 20190217 [1] CRAN (R 3.6.0) ## generics 0.0.2 20181129 [1] CRAN (R 3.6.0) ## ggplot2 * 3.2.1 20190810 [1] CRAN (R 3.6.0) ## glue 1.3.1 20190312 [1] CRAN (R 3.6.0) ## gtable 0.3.0 20190325 [1] CRAN (R 3.6.0) ## haven 2.1.1 20190704 [1] CRAN (R 3.6.0) ## hms 0.5.1 20190823 [1] CRAN (R 3.6.0) ## htmltools 0.4.0 20191004 [1] CRAN (R 3.6.0) ## httr 1.4.1 20190805 [1] CRAN (R 3.6.0) ## jsonlite 1.6 20181207 [1] CRAN (R 3.6.0) ## knitr 1.25 20190918 [1] CRAN (R 3.6.0) ## lattice 0.2038 20181104 [1] CRAN (R 3.6.1) ## lazyeval 0.2.2 20190315 [1] CRAN (R 3.6.0) ## lifecycle 0.1.0 20190801 [1] CRAN (R 3.6.0) ## lubridate 1.7.4 20180411 [1] CRAN (R 3.6.0) ## magrittr 1.5 20141122 [1] CRAN (R 3.6.0) ## modelr 0.1.5 20190808 [1] CRAN (R 3.6.0) ## munsell 0.5.0 20180612 [1] CRAN (R 3.6.0) ## nlme 3.1141 20190801 [1] CRAN (R 3.6.0) ## pillar 1.4.2 20190629 [1] CRAN (R 3.6.0) ## pkgconfig 2.0.3 20190922 [1] CRAN (R 3.6.0) ## purrr * 0.3.2 20190315 [1] CRAN (R 3.6.0) ## R6 2.4.0 20190214 [1] CRAN (R 3.6.0) ## Rcpp 1.0.2 20190725 [1] CRAN (R 3.6.0) ## readr * 1.3.1 20181221 [1] CRAN (R 3.6.0) ## readxl 1.3.1 20190313 [1] CRAN (R 3.6.0) ## rlang 0.4.0 20190625 [1] CRAN (R 3.6.0) ## rmarkdown 1.16 20191001 [1] CRAN (R 3.6.0) ## rstudioapi 0.10 20190319 [1] CRAN (R 3.6.0) ## rvest 0.3.4 20190515 [1] CRAN (R 3.6.0) ## scales 1.0.0 20180809 [1] CRAN (R 3.6.0) ## sessioninfo 1.1.1 20181105 [1] CRAN (R 3.6.0) ## stringi 1.4.3 20190312 [1] CRAN (R 3.6.0) ## stringr * 1.4.0 20190210 [1] CRAN (R 3.6.0) ## tibble * 2.1.3 20190606 [1] CRAN (R 3.6.0) ## tidyr * 1.0.0 20190911 [1] CRAN (R 3.6.0) ## tidyselect 0.2.5 20181011 [1] CRAN (R 3.6.0) ## tidyverse * 1.2.1 20171114 [1] CRAN (R 3.6.0) ## utf8 1.1.4 20180524 [1] CRAN (R 3.6.0) ## vctrs 0.2.0 20190705 [1] CRAN (R 3.6.0) ## withr 2.1.2 20180315 [1] CRAN (R 3.6.0) ## xfun 0.10 20191001 [1] CRAN (R 3.6.0) ## xml2 1.2.2 20190809 [1] CRAN (R 3.6.0) ## yaml 2.2.0 20180725 [1] CRAN (R 3.6.0) ## zeallot 0.1.0 20180128 [1] CRAN (R 3.6.0) ## ## [1] /Library/Frameworks/R.framework/Versions/3.6/Resources/library

from the wikipedia article I found the words: wort=the sugary liquid, mashing = mixing malted barley with hot water, liquor = hot water, grist =crushed malt, sparging = washing of grains, lautering = seperation of wort with grain itself↩
Rbloggers.com offers daily email updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/datascience job.
Want to share your content on Rbloggers? click here if you have a blog, or here if you don't.