As spring follows winter once more here down in southern Sweden, the two sample t-test follows the one sample t-test. This is a continuation of the Bayesian First Aid alternative to the one sample t-test where I’ll introduce the two sample alternative. It will be a quite short post as the two sample alternative is just more of the one sample alternative, more of using John K. Kruschke’s BEST model, and more of the coffee yield data from the 2002 Nature article The Value of Bees to the Coffee Harvest.
Bayesian First Aid is an attempt at implementing reasonable Bayesian alternatives to the classical hypothesis tests in R. For the rationale behind Bayesian First Aid see the original announcement. The delopment of Bayesian First Aid can be followed on GitHub. Bayesian First Aid is a work in progress and I’m grateful for any suggestion on how to improve it!
The model is a straight forward extension of the one sample Bayesian Estimation Supersedes the T-test (BEST), instead of estimating the distribution of one group $x_i$ we now estimate the distributions of two, $x_i$ and $y_j$. As before we use the t-distribution as a robust alternative to the normal distribution and we assume non-informative priors. The degree-of-freedoms parameter $\nu$ governs how fat the tails of the t-distribution are and could be estimated separately for each group. Here we instead assume the two groups share a common $\nu$ resulting in a model with five parameters: $\mu_x, \mu_y, \sigma_x, \sigma_y$, and $\nu$. Here is the full model for the two sample BEST, shown using a Kruschke style diagram:
bayes.t.test Function for Paired Samples
bayes.t.test function can be called exactly as the
t.test function in R. If you just ran
t.test(x, y, conf.level=0.9), a two sample t-test yielding a 90% CI, prepending
bayes. runs the Bayesian First Aid alternative:
bayes.t.test(x, y, conf.level=0.9). To showcase
bayes.t.test I’ll use the same data as last time, the coffee yield data from the 2002 Nature article The Value of Bees to the Coffee Harvest (doi:10.1038/417708a, pdf). In that article David W. Roubik argues that, while the coffee bush is self pollinating, having African honeybees around still increases the coffee yield. This argument is supported by a data set with by-country coffee yields from the periods 1961 to 1980 and 1981 to 2001. Here is the data set shown using a Slopegraph:
The thing is, in the 60s the African honeybee had not yet settled in the New World, something that happened in the period 1981 to 2001. Roubik then argues that the data shows that…
Here “substantial increase in Latin America” refers to a statistically significant mean increase of 1470 kg/10,000² (paired samples t-test, p = 0.004) while “no such change in the Old World” refers to a non-significant mean increase of 777 kg/10,000² (p = 0.23). Roubik makes the classical mistake of assuming that the difference between “significant” and “not significant” is itself statistically significant (Gelman & Stern, 2006, explains why it’s not). What should be compared is not the p-values but the increase in coffee yield in the Old and New World. Let’s do just that using the Bayesian First Aid alternative to the two sample t-test.
First I’ll load in the data (available here) and calculate the difference in yield between the 1960 to 1981 period and the 1981 to 2000 period for each country:
d <- read.csv("roubik_2002_coffe_yield.csv") yield_diff_new <- with(d[d$world == "new", ], yield_81_to_01 - yield_61_to_80) yield_diff_old <- with(d[d$world == "old", ], yield_81_to_01 - yield_61_to_80)
The boxplot below shows the distribution of the yield differences.
We’ll now compare the increase in yields using the Bayesian First Aid alternative to the two sample t-test.
library(BayesianFirstAid) bayes.t.test(yield_diff_new, yield_diff_old)