[This article was first published on R – Statistical Odds & Ends, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Note: This post was inspired by the “Classroom Bingo” probability puzzle in the Royal Statistical Society’s Significance magazine (Dec 2021 edition).

Set-up

Imagine that we are playing bingo, but where the numbers are generated by the roll of two 6-sided dice with faces 1, 2, …, 6. Each round, the two dice are rolled. If the sum of the two dice appears on your bingo card, you can strike it off. The game ends when you strike off all the numbers on your card.

For this simple game, you are asked to write just two numbers on your bingo card (which can be the same number if you wish). What two numbers should you write in order to finish the game as quickly as possible (on average)?

Intuition

With the roll of two dice, the most probable outcome is 7 (6 out of 36), followed by 6 and 8 (5 out of 36), and so on. Hence, the two numbers you should write down are probably (7, 7) or (7, 6) / (7, 8). Which is better?

Simulation

This problem space is small enough that we can estimate the number of rounds for each possible bingo card via Monte Carlo simulation. Here is the function that runs one round of dice bingo in R:

# Roll nDice dice with sides 1, 2, ..., nFaces and return the sum
RollDice <- function(nDice, nFaces) {
dice <- sample(nFaces, size = nDice, replace = TRUE)
sum(dice)
}

SimulateBingo <- function(bingoVec, nDice, nFaces, verbose = FALSE) {
nRolls <- 0
while (length(bingoVec) > 0) {
nRolls <- nRolls + 1
currentRoll <- RollDice(nDice, nFaces)

# check if roll matches a number on card; if so, remove that number
matchIndex <- match(currentRoll, bingoVec)
if (!is.na(matchIndex))
bingoVec <- bingoVec[-matchIndex]

if (verbose) print(paste0("Roll ", nRolls, ": ", currentRoll, ", (",
paste(bingoVec, collapse = ","),
") remaining"))
}
nRolls
}


The SimulateBingo function is flexible enough for us to have any number of numbers on our bingo card at the beginning of the game. Here’s an example output from SimulateBingo(), where we have 3 numbers (6, 6, 7) on our card at the beginning, and at each round we roll 2 dice with sides 1 to 6:

set.seed(2)
SimulateBingo(c(6, 6, 7), 2, 6, verbose = TRUE)

#  "Roll 1: 11, (6,6,7) remaining"
#  "Roll 2: 7, (6,6) remaining"
#  "Roll 3: 6, (6) remaining"
#  "Roll 4: 9, (6) remaining"
#  "Roll 5: 3, (6) remaining"
#  "Roll 6: 4, (6) remaining"
#  "Roll 7: 9, (6) remaining"
#  "Roll 8: 5, (6) remaining"
#  "Roll 9: 7, (6) remaining"
#  "Roll 10: 5, (6) remaining"
#  "Roll 11: 9, (6) remaining"
#  "Roll 12: 7, (6) remaining"
#  "Roll 13: 11, (6) remaining"
#  "Roll 14: 9, (6) remaining"
#  "Roll 15: 6, () remaining"
#  15


Next, we introduce a function that runs SimulateBingo() multiple times and returns the mean number of rounds needed to complete the game:

EstimateRounds <- function(nSim, bingoVec, nDice, nFaces) {
rounds <- replicate(nSim, SimulateBingo(bingoVec, nDice, nFaces))
mean(rounds)
}


(For R aficionados wondering why I didn’t use ... to pass function arguments to SimulateBingo(): it’s because replicate() and ... don’t play well together. See more details here.)

The following code runs 100,000 simulations of the game for each possible pair of numbers on the bingo card (this takes a while to run):

set.seed(2)
nDice <- 2
nFaces <- 6
nSim <- 100000

df <- data.frame(matrix(nrow = 0, ncol = 3))
maxSum <- nDice * nFaces
minSum <- nDice
for (x in minSum:maxSum) {
for (y in x:maxSum) {
print(paste("Simulating bingoVec = (", x, ", ", y, ")"))
meanRounds <- EstimateRounds(nSim, bingoVec = c(x, y), nDice, nFaces)
df <- rbind(df, c(x, y, meanRounds))
}
}
names(df) <- c("x", "y", "mean_rounds")


Let’s make a heatmap to visualize the results:

library(ggplot2)
ggplot(df, aes(factor(x), factor(y))) +
geom_tile(aes(fill = mean_rounds)) +
geom_text(aes(label = sprintf(mean_rounds, fmt = "%.2f"))) +
scale_fill_distiller(palette = "Spectral") +
labs(x = "Bingo value 1", y = "Bingo value 2",
title = "Mean # of dice rolls to reach bingo")
theme_update(legend.position = "null") Notice the symmetry in the plot: the mean number of rounds for $(x, y)$ is basically the same as that for $(14 - x, 14 - y)$. This makes sense due to symmetry inherent in the problem (the probability of rolling $x$ is the same as rolling $7 - x$ for one die).

The smallest number of rounds is obtained for the pairs (6, 7) and (7, 8)!

Explanation

Let $T_1$ denote the number of rounds until (and including) the first time a number on the bingo card is hit, and let $T_2$ denote the number of rounds from then until (and including) the second number is hit. The number of rounds to complete the game is just $T = T_1 + T_2$.

Note that $T_1$ and $T_2$ are geometric random variables, and so their expectation is the reciprocal of the probability of success.

For (7, 7), the probability of success of $T_1$ and $T_2$ is the same: it’s the probability of rolling a 7, which is $6 / 36 = 1/6$. Hence, \begin{aligned} \mathbb{E}[T] &= \mathbb{E}[T_1] + \mathbb{E}[T_2] = 6 + 6 = 12. \end{aligned}

For (6, 7), the computation is a bit trickier because the distribution of $T_2$ depends on whether the 6 or the 7 was hit first. If 6 was hit first, then $T_2 \sim \text{Geom}(6/36)$, and if 7 was hit first, then $T_2 \sim \text{Geom}(5, 36)$. What is the probability that 6 is hit first? It’s $\text{Prob 6 first} = \dfrac{5/36}{5/36 + 6/36} = \dfrac{5}{11}.$

Hence, \begin{aligned} \mathbb{E}[T_2] &= \mathbb{E}[\text{Geom}(6/36)] \cdot \dfrac{5}{11} + \mathbb{E}[\text{Geom}(5/36)] \cdot \left( 1 - \dfrac{5}{11} \right) \\ &= \dfrac{366}{55} \\ &\approx 6.65 \end{aligned} $T_1$ is a geometric distribution with success probability equal to the probability that 6 or 7 is rolled, i.e. $5/36 + 6/36 = 11/36$, implying that $\mathbb{E}[T_1] = 36/11$. Hence, \begin{aligned} \mathbb{E}[T] &= \mathbb{E}[T_1] + \mathbb{E}[T_2] \\ &= \dfrac{36}{11} + \dfrac{366}{55} \\ &= \dfrac{546}{55} \\ &\approx 9.93, \end{aligned}

which is quite a bit less than the expected value for (7, 7)! (Notice that these values are very close to the simulated results.)

On second thought this makes sense: even though it takes longer to hit the second number in (6, 7), this is more than made up for in the time it takes to hit the first number, since we can hit either 6 or 7, not just 7. (Looking at the simulation results suggests that quite a few other pairs of numbers are better than (7, 7).)

Extensions

There are many interesting questions to ponder as extensions to this game:

1. What if you had to write down 3 numbers on your bingo card instead? Or 4? Or n?
2. What if there were more dice?
3. What if the dice had non-standard faces?