# Pairwise Bayesian Comparisons – even faster

[This article was first published on Posts on R Lover ! a programmer, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This post builds upon two earlier posts:

#### Background

This all started with a nice post from Anindya
Mozumdar

on the R Bloggers feed. The topic material was
fun for me (analyzing the performance of male 100m sprinters and the fastest man
on earth), as well as exploring bayesian methods.

Last
post
in
this series I made use of one of the nice features about a Bayesian approach –
we don’t have to worry nearly as much about the multiple comparisons issue
[Gelman, Hill, Yajima (2012)]. But, quite frankly, the code was very ugly with a
lot of repetition and cutting and pasting. In this post I want to clean that all
up. So let’s load the necessary libraries.

``````library(rvest) # to ha"rvest" the web page
library(tidyverse) # using readr, dplyr, and purrr
library(ggstatsplot)
library(BayesFactor)``````

Next let’s duplicate Anindya’s earlier work and scrape the Track and Field
All-Time Performances webpage
to
get the data. One change I’m making is to remove `n_max = 3263` which is
unnecessary and was preventing grabbing the newer race results from summer
2019.

``````male_100_html <-
read_html("http://www.alltime-athletics.com/m_100ok.htm")
male_100_pres <- male_100_html %>%
html_nodes(xpath = "//pre")
male_100_htext <- male_100_pres %>%
html_text()
male_100_htext <- male_100_htext[[1]]

male_100 <- read_fwf(
male_100_htext,
skip = 1,
#  n_max = 3263, # n_max removed to cpture newer races
col_types = cols(.default = col_character()),
col_positions = fwf_positions(
c(1, 16, 27, 35, 66, 74, 86, 93, 123),
c(15, 26, 34, 65, 73, 85, 92, 122, NA)
)
)

male_100 <- male_100 %>%
select(X2, X4) %>%
transmute(timing = X2, runner = X4) %>%
mutate(
timing = gsub("A", "", timing),
timing = as.numeric(timing)
)

# 3267 as of July 8, 2019
nrow(male_100) # if you're cautious you can check against the webpage``````
``## [1] 3267``

Let’s focus on the top 6 runners who have more than 40 race results
recorded. We’ll make an effort throughout this post to capture the parameters we
use and store them as variables and use the names. If you choose to replicate
this post on your own you should be able to change the parameters below and see
how the results vary based upon your choices (for example the top 5 or 10
runners or more than 30 races).

``````numbraces <- 40
howmanyrunners <- 6``````

Having made our selections let’s use a series a `dplyr` commands piped `%>%`
together to create a character vector called `orderbymean` which contains the
names of the 6 runners who meet our criteria. We can use this
vector to filter our dataframe down to just the 6 we want with
a `filter(runner %in% orderbymean)` statement as well as force the factor levels
of `runner` to be in mean order with `factor(male_100\$runner, levels = orderbymean)`.

``````orderbymean <- male_100 %>%
group_by(runner) %>%
summarise(avgtime = mean(timing), races = n()) %>%
arrange(avgtime) %>%
filter(races >= numbraces) %>%
top_n(-howmanyrunners, avgtime) %>%
pull(runner) %>%
as.character()

orderbymean``````
``## [1] "Usain Bolt"     "Asafa Powell"   "Tyson Gay"      "Justin Gatlin"  "Yohan Blake"    "Maurice Greene"``
``````male_100 <- male_100 %>%
filter(runner %in% orderbymean) %>%
mutate_if(is.character, as.factor) %>%
droplevels()

male_100\$runner <-
factor(
male_100\$runner,
levels = orderbymean
)

glimpse(male_100)``````
``````## Observations: 530
## Variables: 2
## \$ timing  9.58, 9.63, 9.69, 9.69, 9.69, 9.71, 9.72, 9.72, 9.74, 9.74, 9.75, 9.75, 9.75, 9.75, 9.76, 9.…
## \$ runner  Usain Bolt, Usain Bolt, Usain Bolt, Tyson Gay, Yohan Blake, Tyson Gay, Usain Bolt, Asafa Pow…``````
``levels(male_100\$runner)``
``## [1] "Usain Bolt"     "Asafa Powell"   "Tyson Gay"      "Justin Gatlin"  "Yohan Blake"    "Maurice Greene"``

Okay we now have the 530 relevant times for the 6
runners we’re focusing on. The focus of this post is to build
the bayesian equivalent of a frequentist’s pairwise comparisons test across all
the unique runner pairings `pairwise.t.test(x = male_100\$timing, g = male_100\$runner, p.adjust.method = "holm")`. As I mentioned in an earlier
post
this
would be the logical next step after conducting a oneway ANOVA of `timing ~ runner` or it’s bayesian equivalent `BayesFactor::anovaBF(timing ~ runner, male_100)`

``````##
##  Pairwise comparisons using t tests with pooled SD
##
## data:  male_100\$timing and male_100\$runner
##
##                Usain Bolt Asafa Powell Tyson Gay Justin Gatlin Yohan Blake
## Asafa Powell   0.0460     -            -         -             -
## Tyson Gay      0.0250     1.0000       -         -             -
## Justin Gatlin  0.0024     1.0000       1.0000    -             -
## Yohan Blake    0.0052     1.0000       1.0000    1.0000        -
## Maurice Greene 6.9e-05    0.1649       1.0000    1.0000        1.0000
##
## P value adjustment method: holm``````

Our Bayes equivalent to this matrix won’t report “p values” but rather the
Bayes Factor associated with the pairing. Unlike the frequentist’s
“reject/don’t reject” criteria, the BF we report will indicate what the data
provide as odds that our hypothesis is correct. We’ll build it methodically and
with an eye towards code that is easily reused in the future.

So, quick quiz, how many unique pair combinations are there for our 6
runners? The order of the pair doesn’t matter at this juncture
we’re looking for the number of possible head to head races among these 6
runners. If you’re like me you don’t know the answer to that off
of the top of your head and it can be tedious figuring it out, so let’s let the
computer always calculate it for us, and tell us. For this case it’s
15.
Then we can use `combn` to take the 6 names and show us what those pairings are
e.g., “Usain Bolt, Asafa Powell”. The result is a matrix of 15 columns
(all the pairings) and two rows (the names of the runners for the pairings).
Just to make it easier to see I’ve added a `t()` so the display is vertical
and you can see the pairs.

``````numberofpairings <- factorial(howmanyrunners) /
(factorial(2) * factorial(howmanyrunners - 2))
numberofpairings``````
``## [1] 15``
``t(combn(orderbymean, 2))``
``````##       [,1]            [,2]
##  [1,] "Usain Bolt"    "Asafa Powell"
##  [2,] "Usain Bolt"    "Tyson Gay"
##  [3,] "Usain Bolt"    "Justin Gatlin"
##  [4,] "Usain Bolt"    "Yohan Blake"
##  [5,] "Usain Bolt"    "Maurice Greene"
##  [6,] "Asafa Powell"  "Tyson Gay"
##  [7,] "Asafa Powell"  "Justin Gatlin"
##  [8,] "Asafa Powell"  "Yohan Blake"
##  [9,] "Asafa Powell"  "Maurice Greene"
## [10,] "Tyson Gay"     "Justin Gatlin"
## [11,] "Tyson Gay"     "Yohan Blake"
## [12,] "Tyson Gay"     "Maurice Greene"
## [13,] "Justin Gatlin" "Yohan Blake"
## [14,] "Justin Gatlin" "Maurice Greene"
## [15,] "Yohan Blake"   "Maurice Greene"``````

#### Purrring right along

Now we can grab the output from `combn` and create two separate vectors,
`runner1` and `runner2`. We’ll take those vectors and create a series of `purrr`
statements using pipes.

• The first is a `map2` which takes `runner1` and `runner2` and creates a list
of 15 dataframes, one for each pairing. The anonymous `function(a, b)` is
simply an organized way of working our way through the 15 pairings. Filtering
and dropping levels and explicitly converting to a dataframe because
`BayesFactor::ttestBF` will generate a warning if you pass it a tibble.

• Next we `purrr::map` the list we just created (`.x = .`) and call the
`ttestBF` function. For each item in the list of 15 dataframes (one for each
runner pairing) it runs with `formula = timing ~ runner`, the dataframe we
passed `data = .`, and in this case we have deliberately specified a
directional hypothesis `nullInterval = c(-Inf, 0)` (it’s “-Inf” because
the faster runner has a smaller timing
). See the excellent BayesFactor
documentation here

for a more complete explanation of directional hypothesis testing.

• We started with two vectors `runner1` and `runner2` after `map2` we had a list
of 15 dataframes. Now after the first `map` pipe we have a list of 15 Bayes
Factor objects (that’s what `ttestBF` generates). We’ll immediately pipe (`%>%`)
that list (`.x = .`) into another `map` where we’ll invoke the `extractBF`
function. `extractBF` produces a dataframe, in this case with 2 rows, one with
the rowname “Alt., r=0.707 -Inf r=0.707 !(-Infwhich we don’t need.

• Now we have a list of 15 dataframes, this time containing the results of our
`ttestBF`. We want the row in each of them that is named “Alt., r=0.707
-Inf stored. So what we want is a list of 15 real numbers. So this time we’ll use
`map_dbl` to let it know we want a list of 15 numbers, `map_dbl(.x = ., ~ .["Alt., r=0.707 -Inf.`

``` ```
• ``` The final pipe simply does some trivial rounding. ```
``` runner1 <- combn(orderbymean, 2)[1, ] runner1 ## [1] "Usain Bolt" "Usain Bolt" "Usain Bolt" "Usain Bolt" "Usain Bolt" "Asafa Powell" ## [7] "Asafa Powell" "Asafa Powell" "Asafa Powell" "Tyson Gay" "Tyson Gay" "Tyson Gay" ## [13] "Justin Gatlin" "Justin Gatlin" "Yohan Blake" runner2 <- combn(orderbymean, 2)[2, ] runner2 ## [1] "Asafa Powell" "Tyson Gay" "Justin Gatlin" "Yohan Blake" "Maurice Greene" "Tyson Gay" ## [7] "Justin Gatlin" "Yohan Blake" "Maurice Greene" "Justin Gatlin" "Yohan Blake" "Maurice Greene" ## [13] "Yohan Blake" "Maurice Greene" "Maurice Greene" bfresults <- map2( runner1, runner2, function(a, b) male_100 %>% filter(runner %in% c(a, b)) %>% droplevels() %>% as.data.frame() ) %>% map(.x = ., ~ ttestBF( formula = timing ~ runner, data = ., nullInterval = c(-Inf, 0) )) %>% map(.x = ., ~ extractBF(x = .)) %>% map_dbl(.x = ., ~ .["Alt., r=0.707 -Inf% round(., digits = 4) bfresults ## [1] 10.2865 9.4539 59.3688 36.4792 2709.0974 0.3217 0.5782 0.7394 8.5068 0.2271 ## [11] 0.2902 0.8574 0.2033 0.5368 0.4184 To some, the complex set of steps that leads to bfresults may look daunting. I’d be a liar if I tried to say I wrote all those lines in one pass and got everything right. My suggestion is that as you build the pipeline you work step by step producing intermediate objects. Once you get the individual steps correct it’s trivial to join them using %>% and .x = .. Now that we have our 15 bayes factors for each of the 15 pairings of runners we should probably join them together into one neat dataframe resultsdf that lays everything out for us. Based on the data available we would read line #5 as the odds are 2709:1 that Usain is faster than Maurice. resultsdf <- data.frame( Runner1 = runner1, Runner2 = runner2, oddsfaster = bfresults ) resultsdf ## Runner1 Runner2 oddsfaster ## 1 Usain Bolt Asafa Powell 10.2865 ## 2 Usain Bolt Tyson Gay 9.4539 ## 3 Usain Bolt Justin Gatlin 59.3688 ## 4 Usain Bolt Yohan Blake 36.4792 ## 5 Usain Bolt Maurice Greene 2709.0974 ## 6 Asafa Powell Tyson Gay 0.3217 ## 7 Asafa Powell Justin Gatlin 0.5782 ## 8 Asafa Powell Yohan Blake 0.7394 ## 9 Asafa Powell Maurice Greene 8.5068 ## 10 Tyson Gay Justin Gatlin 0.2271 ## 11 Tyson Gay Yohan Blake 0.2902 ## 12 Tyson Gay Maurice Greene 0.8574 ## 13 Justin Gatlin Yohan Blake 0.2033 ## 14 Justin Gatlin Maurice Greene 0.5368 ## 15 Yohan Blake Maurice Greene 0.4184 ```
``` The Matrix reloaded (still waiting for #4) Now that we have our resultsdf we can continue about the business of comparing the frequentist results of paired t-tests with their bayesian counterparts. Imagine that we have just completed a Oneway ANOVA of timing ~ runner (I’ll show the results in a bit). Given significant results of the omnibuds F test our next step is likely to run all the pairwise comparisons with some sort of correction for multiple comparisons like pairwise.t.test. (see this post for a review) The results are almost always given as a matrix often without repeating one of the diagonals. The results tell us that we can reject the null hypothesis that the runners have the same time for Usain Bolt versus all the other competitors. But it doesn’t allow us to make any statements about how different (despite the temptation inherent in the very different p values). It supplies almost no information about the other pairings, just that we can not reject the null. pairwise.t.test( x = male_100\$timing, g = male_100\$runner, p.adjust.method = "holm" ) ## ## Pairwise comparisons using t tests with pooled SD ## ## data: male_100\$timing and male_100\$runner ## ## Usain Bolt Asafa Powell Tyson Gay Justin Gatlin Yohan Blake ## Asafa Powell 0.0460 - - - - ## Tyson Gay 0.0250 1.0000 - - - ## Justin Gatlin 0.0024 1.0000 1.0000 - - ## Yohan Blake 0.0052 1.0000 1.0000 1.0000 - ## Maurice Greene 6.9e-05 0.1649 1.0000 1.0000 1.0000 ## ## P value adjustment method: holm Another typical way of displaying the information is graphically as demonstrated here by ggstatsplot::ggbetweenstats. ggbetweenstats( data = male_100, x = runner, y = timing, type = "p", var.equal = TRUE, pairwise.comparisons = TRUE, pairwise.display = "all", partial = FALSE, effsize.type = "unbiased", sort = "ascending", point.jitter.height = 0, messages = FALSE ) Let’s see if we can’t at least produce a similar matrix to what pairwise.t.test yields. I’d like us to be able to do a sort of side by side comparison of the frequentists versus bayesian results. Step by step the process we’ll follow is: Use diag to create a matrix with ones in the diagonal we’ll set the size to howmanyrunners Grab the runners names from orderbymean and populate the rownames and colnames Use combn again this time populating it with numbers (one & two) rather than the runners names Feed those vectors into a for loop to populate the bfmatrix with the data from resultsdf To be consistent with pairwise.t.test remove the first row bfmatrix[-1, ] and the last column bfmatrix[, -howmanyrunners] Finally populate the upper triangle part of the matrix with NA bfmatrix[upper.tri(bfmatrix)] <- NA bfmatrix <- diag(nrow = howmanyrunners) rownames(bfmatrix) <- orderbymean colnames(bfmatrix) <- orderbymean bfmatrix ## Usain Bolt Asafa Powell Tyson Gay Justin Gatlin Yohan Blake Maurice Greene ## Usain Bolt 1 0 0 0 0 0 ## Asafa Powell 0 1 0 0 0 0 ## Tyson Gay 0 0 1 0 0 0 ## Justin Gatlin 0 0 0 1 0 0 ## Yohan Blake 0 0 0 0 1 0 ## Maurice Greene 0 0 0 0 0 1 combn(howmanyrunners, 2) ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] ## [1,] 1 1 1 1 1 2 2 2 2 3 3 3 4 4 5 ## [2,] 2 3 4 5 6 3 4 5 6 4 5 6 5 6 6 one <- combn(howmanyrunners, 2)[2, ] two <- combn(howmanyrunners, 2)[1, ] for (i in 1:numberofpairings) { bfmatrix[one[i], two[i]] <- resultsdf[i, 3] # row i, column 3 which is the BF value } bfmatrix ## Usain Bolt Asafa Powell Tyson Gay Justin Gatlin Yohan Blake Maurice Greene ## Usain Bolt 1.0000 0.0000 0.0000 0.0000 0.0000 0 ## Asafa Powell 10.2865 1.0000 0.0000 0.0000 0.0000 0 ## Tyson Gay 9.4539 0.3217 1.0000 0.0000 0.0000 0 ## Justin Gatlin 59.3688 0.5782 0.2271 1.0000 0.0000 0 ## Yohan Blake 36.4792 0.7394 0.2902 0.2033 1.0000 0 ## Maurice Greene 2709.0974 8.5068 0.8574 0.5368 0.4184 1 bfmatrix <- bfmatrix[-1, ] bfmatrix <- bfmatrix[, -howmanyrunners] bfmatrix ## Usain Bolt Asafa Powell Tyson Gay Justin Gatlin Yohan Blake ## Asafa Powell 10.2865 1.0000 0.0000 0.0000 0.0000 ## Tyson Gay 9.4539 0.3217 1.0000 0.0000 0.0000 ## Justin Gatlin 59.3688 0.5782 0.2271 1.0000 0.0000 ## Yohan Blake 36.4792 0.7394 0.2902 0.2033 1.0000 ## Maurice Greene 2709.0974 8.5068 0.8574 0.5368 0.4184 bfmatrix[upper.tri(bfmatrix)] <- NA bfmatrix ## Usain Bolt Asafa Powell Tyson Gay Justin Gatlin Yohan Blake ## Asafa Powell 10.2865 NA NA NA NA ## Tyson Gay 9.4539 0.3217 NA NA NA ## Justin Gatlin 59.3688 0.5782 0.2271 NA NA ## Yohan Blake 36.4792 0.7394 0.2902 0.2033 NA ## Maurice Greene 2709.0974 8.5068 0.8574 0.5368 0.4184 Success! bfmatrix now looks a lot like the results of pairwise.t.test. But we can do better. Looking at the object produced by pairwise.t.test using str() we can see it is a list with 4 items. \$method contains a simple text string explaining what it is. \$data.name is another text string that tells us where the data came from. \$p.value contains the actual p values and finally \$p.adjust.method contains which method of p adjustment (holm) we used. Since there is no analog “adjustment method” for a bayes factor we can ignore it. Let’s make our own list called bfpairs that mimics that structure. str(pairwise.t.test( x = male_100\$timing, g = male_100\$runner, p.adjust.method = "holm" )) ## List of 4 ## \$ method : chr "t tests with pooled SD" ## \$ data.name : chr "male_100\$timing and male_100\$runner" ## \$ p.value : num [1:5, 1:5] 4.60e-02 2.50e-02 2.44e-03 5.15e-03 6.89e-05 ... ## ..- attr(*, "dimnames")=List of 2 ## .. ..\$ : chr [1:5] "Asafa Powell" "Tyson Gay" "Justin Gatlin" "Yohan Blake" ... ## .. ..\$ : chr [1:5] "Usain Bolt" "Asafa Powell" "Tyson Gay" "Justin Gatlin" ... ## \$ p.adjust.method: chr "holm" ## - attr(*, "class")= chr "pairwise.htest" bfpairs <- list( method = " r = 0.707 Alt Hyp = -Inf ## List of 3 ## \$ method : chr " r = 0.707 Alt Hyp = -Inf What we’d like to do is mimic the print method for pairwise.t.test to include little things like substituting in an em dash instead of the NAs. To do that we need to peek at the print method with getAnywhere(print.pairwise.htest). getAnywhere() is a life saver if you want to be able to inspect a function. getAnywhere(print.pairwise.htest) ## A single object matching 'print.pairwise.htest' was found ## It was found in the following places ## registered S3 method for print from namespace stats ## namespace:stats ## with value ## ## function (x, digits = max(1L, getOption("digits") - 5L), ...) ## { ## cat("\n\tPairwise comparisons using", x\$method, "\n\n") ## cat("data: ", x\$data.name, "\n\n") ## pp <- format.pval(x\$p.value, digits = digits, na.form = "-") ## attributes(pp) <- attributes(x\$p.value) ## print(pp, quote = FALSE, ...) ## cat("\nP value adjustment method:", x\$p.adjust.method, "\n") ## invisible(x) ## } ## ## Hmmmmm. Okay so we pass it a list object (the bfpairs we just created) and it takes the sub components and puts them into the right places on the screen. Spoiler alert, I won’t go into all the details but suffice it to say that format.pval() is problematic for us. It does a very nice job working with p values but p values have a different set of characteristics than bayes factors. Rather than modify format.pval I simply decided to use the generic format function instead. That way the end user can specify all sorts of parameters like the number of digits, the symbol to replace NA, and the justification etc.. Here’s what I came up with after a little bit of work. Hopefully you’ll agree it does a reasonably good job of replicating the functionality of print.pairwise.htest? print.pairwise.bftest <- function(x, digits = 2, nsmall = 0, width = 9, justify = "right", scientific = FALSE, nareplace = "-") { cat("\nPairwise comparisons of bayes factors with", x\$method, "\n\n") cat("data: ", x\$data.name, "\n\n") pp <- format(x\$p.value, digits = digits, nsmall = nsmall, width = width, justify = justify, scientific = scientific ) pp <- gsub("NA", nareplace, pp) print(pp, quote = FALSE) cat("\n\nAnalyzed using BayesFactor::ttestBF\n") invisible(x) } print.pairwise.bftest(bfpairs, digits = 1) ## ## Pairwise comparisons of bayes factors with r = 0.707 Alt Hyp = -Inf Notice that bayes factors aren’t shockingly dissimilar than the conclusions you would draw from a frequentist’s perspective. I still think they are a better choice because you can talk about odds and probabilities cleanly without falling into the frequentist “traps” surrounding what rejection of the null hypothesis is. With our “new” perspective we are safe in making statements that out data strongly support some of the pairwise differences (odds of 2709 to 1 are pretty convincing) and in other cases we can now quantify that odds are it’s “anyone’s race.” Play it again Sam As I wrote this post I wanted to ensure that I could run the analysis on a different set of runners with minimal effort. What follows is the code minus all of the intermediate printing and explanation. The difference is this time we’ll look at the top 7 fastest sprinters and widen our analysis to anyone with at least 20 races. male_100_html <- read_html("http://www.alltime-athletics.com/m_100ok.htm") male_100_pres <- male_100_html %>% html_nodes(xpath = "//pre") male_100_htext <- male_100_pres %>% html_text() male_100_htext <- male_100_htext[[1]] male_100 <- read_fwf( male_100_htext, skip = 1, col_types = cols(.default = col_character()), col_positions = fwf_positions( c(1, 16, 27, 35, 66, 74, 86, 93, 123), c(15, 26, 34, 65, 73, 85, 92, 122, NA) ) ) male_100 <- male_100 %>% select(X2, X4) %>% transmute(timing = X2, runner = X4) %>% mutate( timing = gsub("A", "", timing), timing = as.numeric(timing) ) numbraces <- 20 howmanyrunners <- 7 orderbymean <- male_100 %>% group_by(runner) %>% summarise(avgtime = mean(timing), races = n()) %>% arrange(avgtime) %>% filter(races >= numbraces) %>% top_n(-howmanyrunners, avgtime) %>% pull(runner) %>% as.character() male_100 <- male_100 %>% filter(runner %in% orderbymean) %>% mutate_if(is.character, as.factor) %>% droplevels() male_100\$runner <- factor( male_100\$runner, levels = orderbymean ) numberofpairings <- factorial(howmanyrunners) / (factorial(2) * factorial(howmanyrunners - 2)) runner1 <- combn(orderbymean, 2)[1, ] runner2 <- combn(orderbymean, 2)[2, ] bfresults <- map2( runner1, runner2, function(a, b) male_100 %>% filter(runner %in% c(a, b)) %>% droplevels() %>% as.data.frame() ) %>% map(.x = ., ~ ttestBF( formula = timing ~ runner, data = ., nullInterval = c(-Inf, 0) )) %>% map(.x = ., ~ extractBF(x = .)) %>% map_dbl(.x = ., ~ .["Alt., r=0.707 -Inf% round(., digits = 4) resultsdf <- data.frame( Runner1 = runner1, Runner2 = runner2, oddsfaster = bfresults ) bfmatrix <- diag(nrow = howmanyrunners) rownames(bfmatrix) <- orderbymean colnames(bfmatrix) <- orderbymean one <- combn(howmanyrunners, 2)[2, ] two <- combn(howmanyrunners, 2)[1, ] for (i in 1:numberofpairings) { bfmatrix[one[i], two[i]] <- resultsdf[i, 3] } bfmatrix <- bfmatrix[-1, ] bfmatrix <- bfmatrix[, -howmanyrunners] bfmatrix[upper.tri(bfmatrix)] <- NA bfpairs <- list( method = " r = 0.707 Alt Hyp = -Inf ## ## Pairwise comparisons using t tests with pooled SD ## ## data: male_100\$timing and male_100\$runner ## ## Usain Bolt Asafa Powell Tyson Gay Christian Coleman Justin Gatlin Yohan Blake ## Asafa Powell 0.0682 - - - - - ## Tyson Gay 0.0358 1.0000 - - - - ## Christian Coleman 0.2500 1.0000 1.0000 - - - ## Justin Gatlin 0.0033 1.0000 1.0000 1.0000 - - ## Yohan Blake 0.0071 1.0000 1.0000 1.0000 1.0000 - ## Maurice Greene 8.7e-05 0.2500 1.0000 1.0000 1.0000 1.0000 ## ## P value adjustment method: holm print.pairwise.bftest(bfpairs, digits = 3, scientific = TRUE, nareplace = ".") ## ## Pairwise comparisons of bayes factors with r = 0.707 Alt Hyp = -Inf And voila! Based on our new criteria of the fastest 7 runners with at least 20 races Christian Coleman has been added to the matrix. His mean timings place him square in the middle of the pack between Tyson Gay and Justin Gatlin. But notice the BF comparing him to Usain Bolt is only about 3 which is smaller than Tyson Gay 9.5 and Justin Gatlin 59.4 or any of the other runners. This is likely because the BF always adjusts based upon the amount of evidence available and we only have 23 races of data available for Christian. Remember that one of the nice features of bayesian methodology is that we can quantify support for both the hypothesis we have as well as it’s converse (what a frequentist would call the null hypothesis). So our hypothesis is that Justin Gatlin is faster than Yohan Blake but the bayes factor 2.03e-01 (.203) says that the evidence from the data is that the odds are 1 / 2.03e-01 or about 5:1 that Justin is NOT faster than Yohan. That’s a statement that can not be made when using frequentist methods. Done I’ve really enjoyed this series of posts. I am always open to comments, corrections and suggestions. Feel free to leave a comment in disqus or send me an email. Chuck This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; s.src = '//cdn.viglink.com/api/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script')); Related ShareTweet To leave a comment for the author, please follow the link and comment on their blog: Posts on R Lover ! a programmer. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook... ```
``` ```
``` Comments are closed. ```
``` Search R-bloggers Most visited articles of the week Free Springer Books during COVID19 5 Ways to Subset a Data Frame in R How to write the first for loop in R R – Sorting a data frame by the contents of a column Date Formats in R How to schedule R scripts Installing R packages modelStudio and The Grammar of Interactive Explanatory Model Analysis In-depth introduction to machine learning in 15 hours of expert videos Sponsors // https://support.cloudflare.com/hc/en-us/articles/200169436-How-can-I-have-Rocket-Loader-ignore-my-script-s-in-Automatic-Mode- // this must be placed higher. Otherwise it doesn't work. // data-cfasync="false" is for making sure cloudflares' rocketcache doesn't interfeare with this // in this case it only works because it was used at the original script in the text widget function createCookie(name,value,days) { var expires = ""; if (days) { var date = new Date(); date.setTime(date.getTime() + (days*24*60*60*1000)); expires = "; expires=" + date.toUTCString(); } document.cookie = name + "=" + value + expires + "; path=/"; } function readCookie(name) { var nameEQ = name + "="; var ca = document.cookie.split(';'); for(var i=0;i < ca.length;i++) { var c = ca[i]; while (c.charAt(0)==' ') c = c.substring(1,c.length); if (c.indexOf(nameEQ) == 0) return c.substring(nameEQ.length,c.length); } return null; } function eraseCookie(name) { createCookie(name,"",-1); } async function readTextFile(file) { // Helps people browse between pages without the need to keep downloading the same // ads txt page everytime. This way, it allows them to use their browser's cache. var random_number = readCookie("ad_random_number_cookie"); if(random_number == null) { var random_number = Math.floor(Math.random()*100*(new Date().getTime()/10000000000)); createCookie("ad_random_number_cookie",random_number,1) } file += '?t='+random_number; var rawFile = new XMLHttpRequest(); rawFile.onreadystatechange = function () { if(rawFile.readyState === 4) { if(rawFile.status === 200 || rawFile.status == 0) { // var allText = rawFile.responseText; // document.write(allText); document.write(rawFile.responseText); } } } rawFile.open("GET", file, false); rawFile.send(null); } // readTextFile('https://raw.githubusercontent.com/Raynos/file-store/master/temp.txt'); readTextFile("https://www.r-bloggers.com/wp-content/uploads/text-widget_anti-cache.txt"); Jobs for R usersData Analytics Auditor, Future of Audit LeadData Analytics Auditor, Future of Audit Lead @ London or NewcastleSenior Scientist, Translational Informatics @ Vancouver, BC, CanadaSenior Principal Data Scientist @ Mountain View, California, United StatesTechnical Research Analyst – New York, U.S.Movement Building AnalystInnovation Fellow python-bloggers.com (python/data-science news)Automatically create perfect .gitignore file for your projectHow to Write a Git Commit Message, in 7 StepsPredictive Power Score: Finding predictive patterns in your datasetDocumentation+Pypi for the `teller`, a model-agnostic tool for Machine Learning explainabilityPyBoy: A Python GameBoy EmulatorFree Springer Books during COVID19Encoding your categorical variables based on the response variable and correlations Full list of contributing R-bloggers ```
``` R-bloggers was founded by Tal Galili, with gratitude to the R community. Is powered by WordPress using a bavotasan.com design. Copyright © 2020 R-bloggers. All Rights Reserved. Terms and Conditions for this website var snp_f = []; var snp_hostname = new RegExp(location.host); var snp_http = new RegExp("^(http|https)://", "i"); var snp_cookie_prefix = ''; var snp_separate_cookies = false; var snp_ajax_url = 'https://www.r-bloggers.com/wp-admin/admin-ajax.php'; var snp_ajax_nonce = '27367e587b'; var snp_ignore_cookies = false; var snp_enable_analytics_events = false; var snp_enable_mobile = false; var snp_use_in_all = false; var snp_excluded_urls = []; snp_excluded_urls.push(''); Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.) Click here to close (This popup will not appear again) .snp-pop-109583 .snp-theme6 { max-width: 700px;} .snp-pop-109583 .snp-theme6 h1 {font-size: 17px;} .snp-pop-109583 .snp-theme6 { color: #a0a4a9;} .snp-pop-109583 .snp-theme6 .snp-field ::-webkit-input-placeholder { color: #a0a4a9;} .snp-pop-109583 .snp-theme6 .snp-field :-moz-placeholder { color: #a0a4a9;} .snp-pop-109583 .snp-theme6 .snp-field :-ms-input-placeholder { color: #a0a4a9;} .snp-pop-109583 .snp-theme6 .snp-field input { border: 1px solid #a0a4a9;} .snp-pop-109583 .snp-theme6 .snp-field { color: #000000;} .snp-pop-109583 .snp-theme6 { background: #f2f2f2;} jQuery(document).ready(function() { }); var CaptchaCallback = function() { jQuery('.g-recaptcha').each(function(index, el) { grecaptcha.render(el, { 'sitekey' : '' }); }); }; (function(){ var corecss = document.createElement('link'); var themecss = document.createElement('link'); var corecssurl = "https://www.r-bloggers.com/wp-content/plugins/syntaxhighlighter/syntaxhighlighter3/styles/shCore.css?ver=3.0.9b"; if ( corecss.setAttribute ) { corecss.setAttribute( "rel", "stylesheet" ); corecss.setAttribute( "type", "text/css" ); corecss.setAttribute( "href", corecssurl ); } else { corecss.rel = "stylesheet"; corecss.href = corecssurl; } document.head.appendChild( corecss ); var themecssurl = "https://www.r-bloggers.com/wp-content/plugins/syntaxhighlighter/syntaxhighlighter3/styles/shThemeDefault.css?ver=3.0.9b"; if ( themecss.setAttribute ) { themecss.setAttribute( "rel", "stylesheet" ); themecss.setAttribute( "type", "text/css" ); themecss.setAttribute( "href", themecssurl ); } else { themecss.rel = "stylesheet"; themecss.href = themecssurl; } document.head.appendChild( themecss ); })(); SyntaxHighlighter.config.strings.expandSource = '+ expand source'; SyntaxHighlighter.config.strings.help = '?'; SyntaxHighlighter.config.strings.alert = 'SyntaxHighlighter\n\n'; SyntaxHighlighter.config.strings.noBrush = 'Can\'t find brush for: '; SyntaxHighlighter.config.strings.brushNotHtmlScript = 'Brush wasn\'t configured for html-script option: '; SyntaxHighlighter.defaults['pad-line-numbers'] = false; SyntaxHighlighter.defaults['toolbar'] = false; SyntaxHighlighter.all(); // Infinite scroll support if ( typeof( jQuery ) !== 'undefined' ) { jQuery( function( \$ ) { \$( document.body ).on( 'post-load', function() { SyntaxHighlighter.highlight(); } ); } ); } _stq = window._stq || []; _stq.push([ 'view', {v:'ext',j:'1:7.3.2',blog:'11524731',post:'184214',tz:'-6',srv:'www.r-bloggers.com'} ]); _stq.push([ 'clickTrackerInit', '11524731', '184214' ]); jQuery(document).ready(function (\$) { //\$( document ).ajaxStart(function() { //}); for (var i = 0; i < document.forms.length; ++i) { var form = document.forms[i]; if (\$(form).attr("method") != "get") { \$(form).append('<input type="hidden" name="nAmdgxejI" value="*g4z8ODZ2" />'); } if (\$(form).attr("method") != "get") { \$(form).append('<input type="hidden" name="QpzFJ-m" value="o]8pN0G" />'); } } \$(document).on('submit', 'form', function () { if (\$(this).attr("method") != "get") { \$(this).append('<input type="hidden" name="nAmdgxejI" value="*g4z8ODZ2" />'); } if (\$(this).attr("method") != "get") { \$(this).append('<input type="hidden" name="QpzFJ-m" value="o]8pN0G" />'); } return true; }); jQuery.ajaxSetup({ beforeSend: function (e, data) { //console.log(Object.getOwnPropertyNames(data).sort()); //console.log(data.type); if (data.type !== 'POST') return; if (typeof data.data === 'object' && data.data !== null) { data.data.append("nAmdgxejI", "*g4z8ODZ2"); data.data.append("QpzFJ-m", "o]8pN0G"); } else { data.data = data.data + '&nAmdgxejI=*g4z8ODZ2&QpzFJ-m=o]8pN0G'; } } }); }); /* <![CDATA[ */ jQuery(function(){ jQuery("ul.sf-menu").supersubs({ minWidth: 12, maxWidth: 27, extraWidth: 1 }).superfish({ delay: 100, speed: 250 }); }); /* ]]> */ ```