Warspeed 5 — priors and models continued
Want to share your content on Rbloggers? click here if you have a blog, or here if you don't.
My last 4 posts have all focused on the
vaccines being produced to fight COVID19. They have primarily focused on
Bayesian methods (or at least comparing bayesian to frequentist methods). This
one follows that pattern and provides expanded coverage of the concept of
priors in bayesian thinking, how to operationalize them, and additional
coverage of how to compare bayesian regression models using various tools in
r
.
There’s no actual additional analysis of “real data”. While the news has all
been good lately I haven’t found anything publicly available that begs for
investigation. Instead we’ll use “realistic” but not real data, and let the
process unfold for us.
I will, however, try to show you some tools and tricks in r
to make your
analysis of any data easier and smoother.
What bayesians do
First off let me remind the reader that like most I was educated and trained in
the frequentist tradition. I’m not against frequentist methods and even
admit in the vast majority of cases they lead to similar conclusions. But, I
do believe the bayesian methods are better and let you answer questions you
really want to research. Now a quote from a great book, which also happens
to be available at no cost.
From a Bayesian perspective, statistical inference is all about belief
revision. I start out with a set of candidate hypotheses about the world. I
don’t know which of these hypotheses is true, but do I have some beliefs about
which hypotheses are plausible and which are not. When I observe the data, I
have to revise those beliefs. If the data are consistent with a hypothesis, my
belief in that hypothesis is strengthened. If the data is inconsistent with the
hypothesis, my belief in that hypothesis is weakened. Navarro, page 555.
As a formula you’ll see Bayes Theorem expressed something like this:
\[P(hypothesis  data) = \frac{P(data  hypothesis) * P(hypothesis) }{P(data)}\]
We have data, and we have hypotheses, we have bayesian tools, most “new” bayesians
seem to stumble in the process of making good on our declaration of what our
priors are. Up until today’s post I’ve mainly avoided the topic or briefly
touched on and used terms like “vague”, “uninformed” or “flat”.
See for example
the very flat and uninformed plot from my second blog post in the series.
It’s okay in a bayesian framework to have priors that are vague if that’s what
you really know and what you really mean. Just don’t use them when you know
“more” or “better”. There is a real difference “between this value can be literally
anything” and “this value we are studying has some prior knowledge behind it
and some likely spread of values”.
And, no, you are absolutely NOT guaranteeing success or dooming failure if
you are off on your priors. Obviously, as a scientist, you should pick values
that make sense and not pluck them from thin air or make them up, but rest
assured given sufficient data even priors that are “off” are adjusted by the
data. Isn’t that the point after all? Develop a hypothesis and test it with
data? Notice that in bayesian inference we don’t need to force a “null
hypothesis” that we “reject”.
Come let us reason together
At this point if you’re not comfortable with bayesian inference I probably
haven’t helped you at all and it all probably feels a bit theoretical. Let’s
take our current vaccine problem and work it through.
Step 1 (what do we think we know or believe) about the vaccine and COVID19

A vast amount of time, money, and science have gone into developing these
vaccines, and in many cases they are products of years of previous research –
if not about COVID19 then at least other viruses. The candidate vaccines
have already been tried in Phase I and Phase II in perhaps animals or in
limited numbers of humans. They got to phase III not by being unknown or
“anything is possible” but rather on a belief they are going to be at
least 50% effective. So we know something about the direction and with less
certainty something about the magnitude. 
We know given public data that older people tend to contract COVID19 less.
All the data points to much higher infection rates in younger populations.
Good thing too since the older population tends to have much more severe outcomes.
Notice neither #1 or #2 says anything about how effective the vaccine is
in older populations, we’ll get to that in a minute. But we certainly have
evidence that currently older folks are less likely to get it.
Notice reasonable people can disagree. It’s okay if you have
scientific prior evidence that makes you think otherwise. Express it!
Let the data inform us. 
There is evidence that vaccines can be less effective for older
populations. It is certainly a worry for scientists going into the trial.
Just as there are worries for other health risk factors and potentially
race and ethnicity and even gender as well. Chuck’s hypothesis is
that the vaccine will be slightly less effective for older folks. But I’m
prepared to be wrong and believe this less firmly than I believe the
vaccine is effective.
Step 2 How do we make use of our prior knowledge?
In the last post
we used brms
to build a bayesian equivalent of a glm
model. That’s our plan
again today. So how do we go from my theoretical hypotheses in 13 above to
something we can feed into r
and let it chew on and inform us with our data?
This is often the hard part for newcomers to bayesian inference. But it really
needn’t be if you focus on the basics. Our plan is to feed our hypotheses into a
generalized linear model. As with a the simpler case of modeling with lm
we’ll
produce a model that gives us slope estimates (the venerable \(\beta\) or
\(\hat{b}_n\) coefficients). In traditional frequentist output these are in turn
tested against the hypothesis that they are zero and a pvalue is generated. Zero
slope means that predictor has zero ability to help us predict outcome in
\(y = \hat{b}_1 + \hat{b}_0\) if \(\hat{b}_1 = 0\) then \(y = \hat{b}_0\) the intercept and
\(\hat{b}_1\) can be eliminated.
So our priors need to be converted to something that makes sense as a value
for \(\hat{b}_n\). No effect equals zero and bigger effects mean larger positive or
negative values. Notice that our estimates have a standard error which means
they have a standard deviation which will serve as a way of expressing our
“confidence” about our priors. A small value means we’re more sure that the
effect will be tightly centered, larger SD means we’re less certain.
So going back to 13 above let’s proceed with our priors one by one.

Vaccine effect – So we know something about the direction and with less
certainty something about the magnitude. We could pick from a large number
of possible distributions for our \(\hat{b}_n\) here but hopefully you’re
thinking by now this sounds like anormal
ort
distribution with an
appropriatesd
andmean
. With small amounts of datat
is possibly a
better choice, but remember our data will inform us we are NOT fixing the
outcome. So let’s set a prior for \(\hat{b}_{conditionvaccinated}\) with the
mean at 2.0 and the sd of .5. Why a minus sign? That’s because
\(\hat{b}_{conditionvaccinated}\) compares to the placebo condition and
therefore the number of people who got COVID19 in the trial will be less.
The .5 sd is because we think the vaccine will be at least moderately
effective. No worries later I’ll show you how to be more specific in your
hypotheses. You’ll see this later but forbrms
andSTAN
we express this
asprior(normal(2.0, .5), class = b, coef = conditionvaccinated)

Older folks are less likely to get COVID19. Remember this is a general
statement about older folks regardless of whether they received the placebo
or the vaccine. This difference is less pronounced and less certain so we’ll
set the mean closer to zero at .5 and increase the standard deviation to 1
to indicate we believe it’s more variable and that is expressed as
prior(normal(0.5, 1), class = b, coef = ageOlder)
. 
Chuck’s hypothesis is that the vaccine will be slightly less effective for
older folks. Which means that this time the sign will be positive because
the interactionageOlder:conditionvaccinated
will yield slightly higher
infection rates and therefore \(\hat{b}_{ageOlder:conditionvaccinated}\) will
be a small positive number. But I’m less firm in this belief so once again
standard deviation = 1 and the code is
prior(normal(.5, 1), class = b, coef = ageOlder:conditionvaccinated)
Now, we have reasonable informed priors! And hopefully you have a better idea
on how to set them in a regression model of interest to you. Remember you
can consult the STAN
documentation
for more suggestions on informative priors. We can run get_prior
to get back
information about the defaults. Notice we don’t really care about the overall
intercept \(\hat{b}_0\).
Load libraries and on to the data
Load the necessary libraries as before
library(dplyr) library(brms) library(ggplot2) library(kableExtra) theme_set(theme_bw())
Load the same data as last time, this time I’m providing it as a
structure
for your ease. Again you could have your data in full
long format and accomplish the same modeling I’m providing it
wide and summarized for computational speed and leaving you the dplyr
commands to go long if you like.
agg_moderna_data < structure(list( condition = structure(c(1L, 1L, 2L, 2L), .Label = c("placebo", "vaccinated"), class = "factor"), age = structure(c(1L, 2L, 1L, 2L), .Label = c("Less than 65", "Older"), class = "factor"), didnot = c(9900L, 993L, 9990L, 999L), got_covid = c(100L, 7L, 10L, 1L), subjects = c(10000L, 1000L, 10000L, 1000L)), class = "data.frame", row.names = c(NA, 4L)) # if you need to convert to long format use this... # agg_moderna_data %>% # tidyr::pivot_longer(cols = c(didnot, got_covid), # names_to = "Outcome") %>% # select(subjects) %>% # tidyr::uncount(weights = value) %>% # mutate(Outcome = factor(Outcome)) # Make a nice table for viewing agg_moderna_data %>% kbl(digits = c(0,0,0,0,0), caption = "Notional Data") %>% kable_minimal(full_width = FALSE, position = "left") %>% add_header_above(c("Factors of interest" = 2, "COVID Infection" = 2, "Total subjects" = 1))
Factors of interest

COVID Infection

Total subjects



condition  age  didnot  got_covid  subjects 
placebo  Less than 65  9900  100  10000 
placebo  Older  993  7  1000 
vaccinated  Less than 65  9990  10  10000 
vaccinated  Older  999  1  1000 
Once again I’m assuming there are fewer older subjects in the trials.
Repeating the frequentist GLM solution
So we’ll create a matrix of outcomes called outcomes
with successes
got_covid
and failures didnot
. For binomial and quasibinomial families the
response can be specified … as a twocolumn matrix with the columns giving the
numbers of successes and failures
outcomes < cbind(as.matrix(agg_moderna_data$got_covid, ncol = 1), as.matrix(agg_moderna_data$didnot, ncol = 1)) moderna_frequentist < glm(formula = outcomes ~ age + condition + age:condition, data = agg_moderna_data, family = binomial(link = "logit")) # moderna_frequentist glm results broom::tidy(moderna_frequentist) ## # A tibble: 4 x 5 ## term estimate std.error statistic p.value #### 1 (Intercept) 4.60 0.101 45.7 0. ## 2 ageOlder 0.360 0.392 0.917 3.59e 1 ## 3 conditionvaccinated 2.31 0.332 6.96 3.32e12 ## 4 ageOlder:conditionvaccinated 0.360 1.12 0.321 7.48e 1 ### In case you want to plot the results # sjPlot::plot_model(moderna_frequentist, # type = "int")
Repeating the bayesian solution with no priors given
moderna_bayes_full < brm(data = agg_moderna_data, family = binomial(link = logit), got_covid  trials(subjects) ~ age + condition + age:condition, iter = 12500, warmup = 500, chains = 4, cores = 4, seed = 9, file = "moderna_bayes_full") summary(moderna_bayes_full) ## Family: binomial ## Links: mu = logit ## Formula: got_covid  trials(subjects) ~ age + condition + age:condition ## Data: agg_moderna_data (Number of observations: 4) ## Samples: 4 chains, each with iter = 12500; warmup = 500; thin = 1; ## total postwarmup samples = 48000 ## ## PopulationLevel Effects: ## Estimate Est.Error l95% CI u95% CI Rhat Bulk_ESS ## Intercept 4.60 0.10 4.80 4.41 1.00 46428 ## ageOlder 0.41 0.40 1.27 0.33 1.00 26886 ## conditionvaccinated 2.35 0.34 3.06 1.73 1.00 27428 ## ageOlder:conditionvaccinated 0.05 1.29 2.90 2.17 1.00 18097 ## Tail_ESS ## Intercept 37933 ## ageOlder 23100 ## conditionvaccinated 24161 ## ageOlder:conditionvaccinated 15000 ## ## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS ## and Tail_ESS are effective sample size measures, and Rhat is the potential ## scale reduction factor on split chains (at convergence, Rhat = 1).
Priors and Bayes Factors
Earlier I showed you what priors we want. We can run get_prior
to get back
information about the defaults.
get_prior(data = agg_moderna_data, family = binomial(link = logit), got_covid  trials(subjects) ~ age + condition + age:condition) ## prior class coef group resp dpar ## (flat) b ## (flat) b ageOlder ## (flat) b ageOlder:conditionvaccinated ## (flat) b conditionvaccinated ## student_t(3, 0, 2.5) Intercept ## nlpar bound source ## default ## (vectorized) ## (vectorized) ## (vectorized) ## default
Right now they are all flat. Let’s create an object called my_priors
that contains what we want them to be by model parameter. Then rerun
and put the results into full_model
. Finally let’s plot the
prior’s and posteriors by parameter.
my_priors < c(prior(normal(0.5, 1), class = b, coef = ageOlder), prior(normal(2.0, .5), class = b, coef = conditionvaccinated), prior(normal(.5, 1), class = b, coef = ageOlder:conditionvaccinated)) my_priors ## prior class coef group resp dpar nlpar bound ## normal(0.5, 1) b ageOlder ## normal(2, 0.5) b conditionvaccinated ## normal(0.5, 1) b ageOlder:conditionvaccinated ## source ## user ## user ## user full_model < brm(data = agg_moderna_data, family = binomial(link = logit), got_covid  trials(subjects) ~ age + condition + age:condition, prior = my_priors, iter = 12500, warmup = 500, chains = 4, cores = 4, seed = 9, save_pars = save_pars(all = TRUE), file = "full_model") summary(full_model) ## Family: binomial ## Links: mu = logit ## Formula: got_covid  trials(subjects) ~ age + condition + age:condition ## Data: agg_moderna_data (Number of observations: 4) ## Samples: 4 chains, each with iter = 12500; warmup = 500; thin = 1; ## total postwarmup samples = 48000 ## ## PopulationLevel Effects: ## Estimate Est.Error l95% CI u95% CI Rhat Bulk_ESS ## Intercept 4.61 0.10 4.80 4.42 1.00 54540 ## ageOlder 0.42 0.36 1.17 0.25 1.00 35067 ## conditionvaccinated 2.24 0.26 2.77 1.75 1.00 36868 ## ageOlder:conditionvaccinated 0.35 0.72 1.12 1.69 1.00 31410 ## Tail_ESS ## Intercept 37706 ## ageOlder 27560 ## conditionvaccinated 28006 ## ageOlder:conditionvaccinated 28442 ## ## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS ## and Tail_ESS are effective sample size measures, and Rhat is the potential ## scale reduction factor on split chains (at convergence, Rhat = 1). plot(bayestestR::bayesfactor_parameters(full_model, null = c(.5, .5)))
# bayestestR::sensitivity_to_prior(full_model, index = "Median", magnitude = 10)
With “good” priors we can compare models
Now that we have convinced ourselves theoretically and by running the
full model that we have reasonable and informative priors we can
begin the process of comparing models. First step is to build
the other models by removing parameters and their associated
priors. Hopefully my naming scheme makes it clear what we’re
doing. The obvious and elegant thing to do would be to write
a simple purrr::map
statement that would build a list of models
but for clarity I’ll simply repeat the same steps repeatedly (if
you want the purrr
statement add a comment in disqus
). I’ll
spare you printing all the various model summaries.
# no priors here null_model < brm(data = agg_moderna_data, family = binomial(link = logit), got_covid  trials(subjects) ~ 1, iter = 12500, warmup = 500, chains = 4, cores = 4, seed = 9, save_pars = save_pars(all = TRUE), file = "null_model") # no prior for the interaction term my_priors2 < c(prior(normal(0.5, 1), class = b, coef = ageOlder), prior(normal(2.0, .5), class = b, coef = conditionvaccinated)) no_interaction < brm(data = agg_moderna_data, family = binomial(link = logit), got_covid  trials(subjects) ~ age + condition, prior = my_priors2, iter = 12500, warmup = 500, chains = 4, cores = 4, seed = 9, save_pars = save_pars(all = TRUE), file = "no_interaction") # summary(no_interaction) # No prior for interaction or vaccine my_priors3 < c(prior(normal(0.5, 1), class = b, coef = ageOlder)) no_vaccine < brm(data = agg_moderna_data, family = binomial(link = logit), got_covid  trials(subjects) ~ age, prior = my_priors3, iter = 12500, warmup = 500, chains = 4, cores = 4, seed = 9, save_pars = save_pars(all = TRUE), file = "no_vaccine") # summary(no_vaccine) # no prior for interaction or for age my_priors4 < c(prior(normal(2.0, .5), class = b, coef = conditionvaccinated)) no_age < brm(data = agg_moderna_data, family = binomial(link = logit), got_covid  trials(subjects) ~ condition, prior = my_priors4, iter = 12500, warmup = 500, chains = 4, cores = 4, seed = 9, save_pars = save_pars(all = TRUE), file = "no_age") # summary(no_age)
Okay, we now have models named full_model
, no_interaction
, no_age
, no_vaccine
, and
null_model
. What we’d like to know is which is the “best” model, where best is
a balance between accuracy and parsimony. Highly predictive but doesn’t include
any unnecessary predictors. There are a number of ways to accomplish this including AIC
and BIC
and even comparing R squared
but we’re going to focus on
comparing Bayes Factors.
Using bayestestR::bayesfactor_models
we’ll compare all our models to the null_model
.
Then since as usual the null_model
is a rather silly point of comparison we’ll
use update
to change our comparison point to the model that does NOT include
the interaction term (a common question in research). That will help us understand
whether we need to worry about whether the vaccine is less effective in older subjects.
To make your workflow easier I actually recommend skipping this manual update
process
and going straight to bayestestR::bayesfactor_inclusion(comparison, match_models = TRUE)
.
comparison < bayestestR::bayesfactor_models(full_model, no_interaction, no_age, no_vaccine, denominator = null_model) comparison ## # Bayes Factors for Model Comparison ## ## Model BF ## [1] age + condition + age:condition 7.352e+18 ## [2] age + condition 9.708e+18 ## [3] condition 1.991e+19 ## [4] age 0.486 ## ## * Against Denominator: [5] (Intercept only) ## * Bayes Factor Type: marginal likelihoods (bridgesampling) update(comparison, reference = 2) ## # Bayes Factors for Model Comparison ## ## Model BF ## [1] age + condition + age:condition 0.757 ## [3] condition 2.051 ## [4] age 5.006e20 ## [5] (Intercept only) 1.030e19 ## ## * Against Denominator: [2] age + condition ## * Bayes Factor Type: marginal likelihoods (bridgesampling) bayestestR::bayesfactor_inclusion(comparison, match_models = TRUE) ## # Inclusion Bayes Factors (Model Averaged) ## ## Pr(prior) Pr(posterior) Inclusion BF ## age 0.4 0.263 0.488 ## condition 0.4 0.801 1.993e+19 ## age:condition 0.2 0.199 0.757 ## ## * Compared among: matched models only ## * Priors odds: uniformequal ### This is possible but NOT recommended # bayestestR::bayesfactor_inclusion(comparison)
There is very little evidence that we should include age
or the
age:condition interaction terms in our model. If anything we have modest
evidence that we should exclude them! Weak evidence to be sure but evidence
nonetheless (remember bayesian inference allows for “supporting” the null,
not just rejecting it).
More elegant testing (order restriction)
Notice that our priors are unrestricted  that is, our \(\hat{b}_n\) parameters in
the model have some nonzero credibility all the way out to infinity (no matter
how small; this is true for both the prior and posterior distribution). Does it
make sense to let our priors cover all of these possibilities? Our priors can be
formulated as restricted priors (Morey, 2015; Morey & Rouder, 2011).
By testing these restrictions on prior and posterior samples, we can see how the
probabilities of the restricted distributions change after observing the data.
This can be achieved with
bayesfactor_restricted(),
that computes a Bayes factor for these restricted model vs the unrestricted
model.
Think or it as more precise hypothesis testing. We need to specify these
restrictions as logical conditions. An easy one is that I’m very confident that
the vaccine is highly effective and \(\hat{b}_{conditionvaccinated}\) should be no
where near 0. As a matter of research I want to know how the data support the
notion that is less than 2 or a very large effect. How does that belief square
with our data? More complex but
testable, is that \(\hat{b}_{ageOlder.conditionvaccinated}\) (the interaction) if
it is non zero is quite modest. The vaccine may work a little better or a little
worse on older people but it is not a big difference.
chuck_hypotheses1 < c("b_conditionvaccinated < 2") bayestestR::bayesfactor_restricted(full_model, hypothesis = chuck_hypotheses1) ## # Bayes Factor (OrderRestriction) ## ## Hypothesis P(Prior) P(Posterior) BF ## b_conditionvaccinated < 2 0.497 0.822 1.654 ## ## * Bayes factors for the restricted model vs. the unrestricted model. chuck_hypotheses2 < c("b_conditionvaccinated < 2", "(b_ageOlder.conditionvaccinated > 0.5) & (b_ageOlder.conditionvaccinated < 0.5)" ) bayestestR::bayesfactor_restricted(full_model, hypothesis = chuck_hypotheses2) ## # Bayes Factor (OrderRestriction) ## ## Hypothesis ## b_conditionvaccinated < 2 ## (b_ageOlder.conditionvaccinated > 0.5) & (b_ageOlder.conditionvaccinated < 0.5) ## P(Prior) P(Posterior) BF ## 0.499 0.822 1.648 ## 0.340 0.444 1.305 ## ## * Bayes factors for the restricted model vs. the unrestricted model.
Although the support is limited (remember we had very few older subjects who got
COVID19) we at least have limited support and as our numbers continue to expand
over the coming months we can continue to watch the support (hopefully grow).
Done
Hope you found this continuation of the series helpful. As always feel free
to comment. Personally I find it helpful to work on real world problems
and if you can follow these methods you’re in great shape in the
next few months as more data becomes available to build even more complex
models.
Keep counting the votes! Every last one of them!
Chuck
This work is licensed under a Creative Commons AttributionShareAlike 4.0 International License
Rbloggers.com offers daily email updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/datascience job.
Want to share your content on Rbloggers? click here if you have a blog, or here if you don't.