Bayesian First Aid: Two Sample t-test

Posted on February 24, 2014 by Rasmus Bååth in R bloggers | 0 Comments

[This article was first published on Publishable Stuff, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

As spring follows winter once more here down in southern Sweden, the two sample t-test follows the one sample t-test. This is a continuation of the Bayesian First Aid alternative to the one sample t-test where I’ll introduce the two sample alternative. It will be a quite short post as the two sample alternative is just more of the one sample alternative, more of using John K. Kruschke’s BEST model, and more of the coffee yield data from the 2002 Nature article The Value of Bees to the Coffee Harvest.

Bayesian First Aid is an attempt at implementing reasonable Bayesian alternatives to the classical hypothesis tests in R. For the rationale behind Bayesian First Aid see the original announcement. The delopment of Bayesian First Aid can be followed on GitHub. Bayesian First Aid is a work in progress and I’m grateful for any suggestion on how to improve it!

The Model

The model is a straight forward extension of the one sample Bayesian Estimation Supersedes the T-test (BEST), instead of estimating the distribution of one group $x_i$ we now estimate the distributions of two, $x_i$ and $y_j$. As before we use the t-distribution as a robust alternative to the normal distribution and we assume non-informative priors. The degree-of-freedoms parameter $\nu$ governs how fat the tails of the t-distribution are and could be estimated separately for each group. Here we instead assume the two groups share a common $\nu$ resulting in a model with five parameters: $\mu_x, \mu_y, \sigma_x, \sigma_y$, and $\nu$. Here is the full model for the two sample BEST, shown using a Kruschke style diagram:

(If you want to know more about Kruschke style diagrams and how you can easily make them check out DIY Kruschke Style Diagrams and How Do You Write Your Model Definitions?)

The `bayes.t.test` Function for Paired Samples

The bayes.t.test function can be called exactly as the t.test function in R. If you just ran t.test(x, y, conf.level=0.9), a two sample t-test yielding a 90% CI, prepending bayes. runs the Bayesian First Aid alternative: bayes.t.test(x, y, conf.level=0.9). To showcase bayes.t.test I’ll use the same data as last time, the coffee yield data from the 2002 Nature article The Value of Bees to the Coffee Harvest (doi:10.1038/417708a, pdf). In that article David W. Roubik argues that, while the coffee bush is self pollinating, having African honeybees around still increases the coffee yield. This argument is supported by a data set with by-country coffee yields from the periods 1961 to 1980 and 1981 to 2001. Here is the data set shown using a Slopegraph:

The thing is, in the 60s the African honeybee had not yet settled in the New World, something that happened in the period 1981 to 2001. Roubik then argues that the data shows that…

Here “substantial increase in Latin America” refers to a statistically significant mean increase of 1470 kg/10,000² (paired samples t-test, p = 0.004) while “no such change in the Old World” refers to a non-significant mean increase of 777 kg/10,000² (p = 0.23). Roubik makes the classical mistake of assuming that the difference between “significant” and “not significant” is itself statistically significant (Gelman & Stern, 2006, explains why it’s not). What should be compared is not the p-values but the increase in coffee yield in the Old and New World. Let’s do just that using the Bayesian First Aid alternative to the two sample t-test.

First I’ll load in the data (available here) and calculate the difference in yield between the 1960 to 1981 period and the 1981 to 2000 period for each country:

d <- read.csv("roubik_2002_coffe_yield.csv")
yield_diff_new <- with(d[d$world == "new", ], yield_81_to_01 - yield_61_to_80)
yield_diff_old <- with(d[d$world == "old", ], yield_81_to_01 - yield_61_to_80)

The boxplot below shows the distribution of the yield differences.

We’ll now compare the increase in yields using the Bayesian First Aid alternative to the two sample t-test.

library(BayesianFirstAid)
bayes.t.test(yield_diff_new, yield_diff_old)

Bayesian Estimation Supersedes the T-test (BEST) Online

In addition to using the two sample BEST as the Bayesian First Aid alternative to the t-test I’ve also implemented a javascript version that allows you to estimate this model directly in your browser using MCMC. Head over to http://sumsar.net/best_online/ and try it out with the same coffee yield data as in the examples above (just copy-n-paste from below), it should (hopefully) result in similar estimates.

### Increase in Coffee yield for the New World countries 5481 1081 -1267 2743 1820 3168 1842 1318 253 -111 1777 151 851 ### Increase in Coffee yield for the Old World countries 9129 1039 6143 -435 3130 -9195 1450 3287 -549 694 -322 -248 -384 -1002 -1077

The javascript implementation uses a difference MCMC algorithm than the R & JAGS version where the burn in period is also used for tuning the sampler. In the case with the coffee data I would recommend increasing the burn in period to 40000 (or decrease it to 1 to see how badly behaving MCMC sampling looks).

To leave a comment for the author, please follow the link and comment on their blog: Publishable Stuff.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Bayesian First Aid: Two Sample t-test

The Model

The `bayes.t.test` Function for Paired Samples

Generic Functions

Bayesian Estimation Supersedes the T-test (BEST) Online

Related

The Model

The bayes.t.test Function for Paired Samples

Generic Functions

Bayesian Estimation Supersedes the T-test (BEST) Online

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

The `bayes.t.test` Function for Paired Samples

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)