Multiple Comparisons with BayesFactor, Part 1

[This article was first published on BayesFactor: Software for Bayesian inference, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.


On of the most frequently asked questions about the BayesFactor package is how to do multiple comparisons; that is, given that some effect exists across factor levels or means, how can we test whether two specific effects are unequal. In the next two posts, I’ll explain how this can be done in two cases: in Part 1, I’ll cover tests for equality, and in Part 2 I’ll cover tests for specific order-restrictions.

Before we start, I will note that these methods are only meant to be used for pre-planned comparisons. They should not be used for post hoc comparisons.

An Example

Suppose we are interested in the basis for feelings of moral disgust. One prominent theory, from the embodied cognition point of view, holds that feelings of moral disgust are extensions of more basic feelings of disgust: disgust for physical things, such as rotting meat, excrement, etc (Schnall et al, 2008; but see also Johnson et al., 2014 and Landy & Goodwin, in press). Under this theory, moral disgust is not only metaphorically related to physical disgust, but may share physiological responses with physical disgust.

Suppose we wish to experimentally test this theory, which predicts that feelings of physical disgust can be “transferred” to possible objects of moral disgust. We ask 150 participants to fill out a questionnaire that measures the harshness of their judgments of undocumented migrants. Participants are randomly assigned to one of three conditions, differing by the odor present in the room: a pleasant scent associated with cleanliness (lemon), a disgusting scent (sulfur), and a control condition in which no unusual odor is present. The dependent variable is the score on the questionnaire, which ranges from 0 to 50 with higher scores representing harsher moral judgment.

Hypothetical data, simulated for the sake of example, can be read into R using the url() function:
# Read in the data from the learnbayes.org
disgust_data = read.table(url('http://www.learnbayes.org/disgust_example.txt'),header=TRUE)

A boxplot and means/standard errors reveal that the effects appear to be in the predicted direction:



(note that the axes are different in the two plots, so that the standard errors can be seen)

And we can perform a classical ANOVA on these data:
# ANOVA
summary(aov(score ~ condition, data = disgust_data))

##              Df Sum Sq Mean Sq F value Pr(>F)  
## condition     2    263   131.4    2.91  0.058 .
## Residuals   147   6635    45.1                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The classical test of the null hypothesis that all means are equal just fails to reach significance at (alpha=0.05).

A Bayes factor analysis

We can easily perform a Bayes factor test of the null hypothesis using the BayesFactor package. This assumes that the prior settings are acceptable; because this post is about multiple comparisons, we will not explore prior settings here. See ?anovaBF for more information.

The anovaBF is a convenience function to perform Bayes factor ANOVA-like analyses. The code for the Bayes factor analysis is almost identical to the code for the classical test:
library(BayesFactor)
bf1 = anovaBF(score ~ condition, data = disgust_data)

bf1

## Bayes factor analysis
## --------------
## [1] condition : 0.7738 ±0.01%
## 
## Against denominator:
##   Intercept only 
## ---
## Bayes factor type: BFlinearModel, JZS
The Bayes factor in favor of a condition effect is about 0.774, or 1/0.774 = 1.3 in favor of the null (the “Intercept only” model). This is not strong evidence for either the null or the alternative, which given the moderate p value is perhaps not surprising. It should be noted here that even if the p value had just crept in under 0.05, the Bayes factor would not be appreciably different, which shows the inherent arbitrariness of significance testing.

Many possible hypotheses?

This analysis is not the end of the story, however. The hypothesis tested above — that all means are different, but with no further specificity — was not really the hypothesis of interest. The hypothesis of interest was more specific. We might consider an entire spectrum of hypotheses, listed in increasing order of constraint:
  • (most constrained) The null hypothesis (control = lemon = sulfur)
  • (somewhat constrained) Unexpected scents cause the same effect, regardless of type (lemon = sulfur ≠ control; this might occur, for instance, both “clean” and “disgusting” scents prime the same underlying concepts)
  • (somewhat constrained) Only disgusting scents have an effect (control = lemon ≠ sulfur)
  • (somewhat constrained) Only pleasant scents have an effect (control = sulfur ≠ lemon)
  • (unconstrained) All scents have unique effects (control ≠ sulfur ≠ lemon)
The above are all equality constraints. We can also specify order constraints, such as lemon < control < sulfur. The unconstrained model tested above (control ≠ sulfur ≠ lemon) does not give full credit to this ordering prediction. In the next section, I will show how to test equality constraints. In Part 2 of this post, I will show how to test order constraints.

Testing equality constraints

To test equality constraints, we must first consider what an equality constraint means. Claiming that an equality constraint holds is the same as saying that your predictions for data would not change if the two conditions are supposed to be the same had exactly the same label. If want to to impose the constraint that lemon = sulfur ≠ control, we merely have to give lemon and sulfur the same label.

In practice, this means making a new column in the data frame with the required change:
# Copy the condition column that we will change
# We use 'as.character' to avoid using the same factor levels
disgust_data$lemon.eq.sulfur = as.character(disgust_data$condition)
# Change all 'lemon' to 'lemon/sulfur'
disgust_data$lemon.eq.sulfur[ disgust_data$condition == "lemon" ] = 'lemon/sulfur'
# Change all 'sulfur' to 'lemon/sulfur'
disgust_data$lemon.eq.sulfur[ disgust_data$condition == "sulfur" ] = 'lemon/sulfur'
# finally, make the column a factor
disgust_data$lemon.eq.sulfur = factor(disgust_data$lemon.eq.sulfur)
We now have a data column, called lemon.eq.sulfur, that labels the data so that lemon and sulfur have the same labels. We can use this in Bayes factor test:
bf2 = anovaBF(score ~ lemon.eq.sulfur, data = disgust_data)

bf2

## Bayes factor analysis
## --------------
## [1] lemon.eq.sulfur : 0.1921 ±0%
## 
## Against denominator:
##   Intercept only 
## ---
## Bayes factor type: BFlinearModel, JZS
The null hypothesis is now preferred by a factor of 1/0.192 = 5.2, which is expected given that lemon and sulfur were the least similar pair of three means. The null hypothesis accounts for the data better than this constraint.

One of the conveniences of using Bayes factors is if we have two hypotheses that are both tested against the same third hypothesis, we can test the two hypotheses against one another. The BayesFactor package makes this easy; any two BayesFactor objects compared against the same denominator — in this case, the intercept-only null hypothesis — can be combined together:
bf_both_tests = c(bf1, bf2)
bf_both_tests

## Bayes factor analysis
## --------------
## [1] condition       : 0.7738 ±0.01%
## [2] lemon.eq.sulfur : 0.1921 ±0%
## 
## Against denominator:
##   Intercept only 
## ---
## Bayes factor type: BFlinearModel, JZS

We could, for instance, put all equality-constraint tests into the same object, and then compare them like so:
bf_both_tests[1] / bf_both_tests[2]

## Bayes factor analysis
## --------------
## [1] condition : 4.029 ±0.01%
## 
## Against denominator:
##   score ~ lemon.eq.sulfur 
## ---
## Bayes factor type: BFlinearModel, JZS
The fully unconstrained hypothesis, represented by condition, is preferred to the lemon = sulfur ≠ control hypothesis by a factor of about 4.

In the next post, we will use the posterior() function to draw from the posterior of the unconstrained model, which will allow us to test ordering constraints.

To leave a comment for the author, please follow the link and comment on their blog: BayesFactor: Software for Bayesian inference.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)