[This article was first published on R – Statistical Modeling, Causal Inference, and Social Science, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

[edit: Juho Kokkala corrected my homework. Thanks! I updated the post. Also see some further elaboration in my reply to Andrew’s comment. As Andrew likes to say …]

So far, Giancarlo Stanton has hit 56 home runs in 555 at bats over 149 games. Miami has 10 games left to play. What’s the chance he’ll hit 61 or more home runs? Let’s make a simple back-of-the-envelope Bayesian model and see what the posterior event probability estimate is.

Sampling notation

A simple model that assumes a home run rate per at bat with a uniform (conjugate) prior:

$\theta \sim \mbox{Beta}(1, 1)$

The data we’ve seen so far is 56 home runs in 555 at bats, so that gives us our likelihood.

$56 \sim \mbox{Binomial}(555, \theta)$

Now we need to simulate the rest of the season and compute event probabilities. We start by assuming the at-bats in the rest of the season is Poisson.

$\mathit{ab} \sim \mbox{Poisson}(10 \times 555 / 149)$

We then take the number of home runs to be binomial given the number of at bats and the home run rate.

$h \sim \mbox{Binomial}(\mathit{ab}, \theta)$

Finally, we define an indicator variable that takes the value 1 if the total number of home runs is 61 or greater and the value of 0 otherwise.

$\mbox{gte61} = \mbox{I}[h \geq (61 - 56)]$

Event probability

The probability Stanton hits 61 or more home runs (conditioned on our model and his performance so far) is then the posterior expectation of that indicator variable,

$\displaystyle \mbox{Pr}[h \geq (61 - 56)] \\[6pt] \hspace*{3em} \displaystyle { } = \ \int_{\theta} \ \sum_{ab} \, \ \mathrm{I}[h \geq 61 - 56] \\ \hspace*{8em} \ \times \ \mbox{Binomial}(h \mid ab, \theta) \\[6pt] \hspace*{8em} \ \times \ \mbox{Poisson}(ab \mid 10 \ \times \ 555 / 149) \\[6pt] \hspace*{8em} \ \times \ \mbox{Beta}(\theta \mid 1 + 56, 1 + 555 - 56) \ \mathrm{d}\theta.$

Computation in R

The posterior for $\theta$ is analytic because the prior is conjugate, letting us simulate the posterior chance of success given the observed successes (56) and number of trials (555). The number of at bats is independent and also easy to simulate with a Poisson random number generator. We then simulate the number of hits on the outside as a random binomial, and finally, we compare it to the total and then report the fraction of simulations in which the simulated number of home runs put Stanton at 61 or more:

> sum(rbinom(1e5,
rpois(1e5, 10 * 555 / 149),
rbeta(1e5, 1 + 56, 1 + 555 - 56))
>= (61 - 56)) / 1e5
[1] 0.34


That is, I’d give Stanton about a 34% chance conditioned on all of our assumptions and what he’s done so far.

Disclaimer

The above is intended for recreational use only and is not intended for serious bookmaking.

Exercise

You guessed it—code this up in Stan. You can do it for any batter, any number of games left, etc. It really works for any old statistics. It’d be even better hierarchically with more players (that’ll pull the estimate for $\theta$ down toward the league average). Finally, the event probability can be done with an indicator variable in the generated quantities block.

The basic expression looks like you need discrete random variables, but we only need them here for the posterior calculation in generated quantities. So we can use Stan’s random number generator functions to do the posterior simulations right in Stan.

The post Will Stanton hit 61 home runs this season? appeared first on Statistical Modeling, Causal Inference, and Social Science.