Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

## FiveThirtyEight’s Riddler Classic

After a long night of frivolous quackery, two delirious ducks are having a difficult time finding each other in their pond. The pond happens to contain a 3×3 grid of rocks.

Every minute, each duck randomly swims, independently of the other duck, from one rock to a neighboring rock in the 3×3 grid — up, down, left or right, but not diagonally. So if a duck is at the middle rock, it will next swim to one of the four side rocks with probability 1/4. From a side rock, it will swim to one of the two adjacent corner rocks or back to the middle rock, each with probability 1/3. And from a corner rock, it will swim to one of the two adjacent side rocks with probability 1/2.

If the ducks both start at the middle rock, then on average, how long will it take until they’re at the same rock again? (Of course, there’s a 1/4 chance that they’ll swim in the same direction after the first minute, in which case it would only take one minute for them to be at the same rock again. But it could take much longer, if they happen to keep missing each other.)

Extra credit: What if there are three or more ducks? If they all start in the middle rock, on average, how long will it take until they are all at the same rock again?

## Plan

The plan is to run a straight forward simulation of the game. I will try to parameterize as many variables of the game as possible so that different numbers of ducks or pond dimensions can be tested.

## Setup

```knitr::opts_chunk\$set(echo = TRUE, comment = "#>", cache = TRUE, dpi = 300)
library(mustashe)
library(ggraph)
library(tidygraph)
library(tidyverse)
library(conflicted)
# Handle any namespace conflicts.
conflict_prefer("filter", "dplyr")
conflict_prefer("select", "dplyr")
# Default 'ggplot2' theme.
theme_set(theme_minimal())
# Some standard colors used throughout
green <- "#54c761"
red <- "#c75454"
blue <- "#547ec7"
purple <- "#a06bdb"
light_grey <- "grey70"
grey <- "grey40"
# For reproducibility.
set.seed(0)
```

I decided to abstract the pond as a graph because there are discrete states (the rocks in the pond) and specific links between these states (the constraints on which rocks can be traveled to from another). The graph was built by first creating a data frame of all possible nodes with \$x\$ and \$y\$ coordinates using `expand.grid()` and giving each a unique identifier (`name`). Then, every possible edge between each pair of nodes were compiled into a data frame using `combn()` and reduced by only keeping those with a distance (using the \$x\$ and \$y\$ coordinates of each node) of 1.

```# Calculate the distance between nodes `a` and `b`.
node_distance <- function(a, b) {
sqrt((a\$x - b\$x)^2 + (a\$y - b\$y)^2)
}
# Build the graph of a pond with dimensions `n` x `m`.
build_pond_graph <- function(n, m) {
nodes <- expand.grid(x = seq(1, n), y = seq(1, m)) %>%
as_tibble() %>%
transmute(name = as.character(row_number()),
x, y)
num_nodes <- nrow(nodes)
edges <- combn(seq(1, num_nodes), 2) %>%
t() %>%
as.data.frame() %>%
as_tibble() %>%
set_names(c("from", "to")) %>%
filter(map2_lgl(from, to, function(n1, n2) {
node_distance(nodes[nodes\$name == n1, ], nodes[nodes\$name == n2, ]) == 1
}))
as_tbl_graph(edges, directed = FALSE) %N>%
left_join(nodes, by = "name")
}
```

An example graph for the pond in this riddle is shown below.

```pond <- build_pond_graph(3, 3)
ggraph(pond, "grid") +
geom_node_label(aes(label = name),
color = "black",
label.r = unit(1, "mm"),
label.size = 0.2) +
theme_graph()
``` Because I want to parameterize this simulation, the dimensions of the pond can change. Therefore, the starting point at the middle rock must be identified algorithmically. This is further complicated by how the pond can be rectangular and have an even width or height. For these reasons, I decided to utilize the graph structure of the abstraction to find the center of the pond using the betweenness centrality. The betweenness centrality is the number of geodesics (shortest paths) that travel through a node. Thus, it seemed like a natural fit for this application. Below is a visualization of the centrality of each node in the 3x3 pond graph.

```pond %N>%
mutate(ctr = centrality_betweenness(directed = FALSE)) %>%
ggraph("grid") +
geom_node_label(aes(label = name, size = ctr),
color = "black",
label.r = unit(1, "mm"),
label.size = 0.2) +
scale_size_continuous(range = c(3, 10)) +
theme_graph() +
theme(legend.position = "none")
``` The `find_central_node()` function handles this process for the simulation, taking a graph and returning the node with the highest centrality value. If there are ties, only the first is returned. The function is memoised so it only has to run once per graph.

```find_central_node <- function(gr) {
gr %N>%
mutate(ctr = centrality_betweenness(directed = FALSE)) %>%
as_tibble() %>%
filter(ctr == max(ctr)) %>%
slice(1) %>%
pull(name) %>%
unlist()
}
find_central_node <- memoise::memoise(find_central_node)
```

## The simulation

The final step before building the main simulation function is to create a function that moves a duck a single step on the graph. This is accomplished in the `take_a_step()` function by using the `igraph::ego()` function to isolate the immediate neighborhood of a node (`n`) in a graph (`gr`) and sampling from those nodes.

```# Take a step to another node in `gr` from node `n`.
take_a_step <- function(n, gr) {
neighbors <- igraph::ego(gr, order = 1, nodes = n, mode = "all") %>%
unlist() %>%
names()
sample(neighbors[neighbors != n], 1)
}
```

The main simulation function, `simulate_delirious_ducks()` is relatively simple. It first builds a pond and locates the center. Then a list of duck positions, one per duck, is instantiated with the name of the center node for each duck. Finally, the `for` loop runs the simulation by moving each duck a single position on each iteration. The ducks are moved by mapping the `take_a_step()` function over the `ducks` list. If all of the ducks are on the same stone (i.e. there is only one distinct value in the `ducks` list), then the `for` loop is broken to stop the simulation. The `tracker` data frame records the positions of all of the ducks throughout the simulation and is returned at the end of the function.

```simulate_delirious_ducks <- function(pond_n = 3, pond_m = 3,
n_ducks = 2,
start_node = NULL,
max_iters = 1e3) {
# Make pond graph.
pond <- build_pond_graph(pond_n, pond_m)
# Get the center node to start from which to start the ducks.
if (is.null(start_node)) {
start_node <- find_central_node(pond)
}
# Make a list of positions for the ducks
ducks <- rep(start_node, n_ducks)
# Iterate until all the ducks are at the same location or max iterations.
tracker <- tibble()
for (i in seq(1, max_iters)) {
ducks <- map(ducks, take_a_step, gr = pond)
tracker <- bind_rows(tracker,
tibble(i, duck_pos = list(unlist(ducks))))
if (n_distinct(ducks) == 1) { break }
}
return(tracker)
}
simulate_delirious_ducks()

#> # A tibble: 12 x 2
#> i duck_pos
#> <int> <list>
#> 1 1 <chr >
#> 2 2 <chr >
#> 3 3 <chr >
#> 4 4 <chr >
#> 5 5 <chr >
#> 6 6 <chr >
#> 7 7 <chr >
#> 8 8 <chr >
#> 9 9 <chr >
#> 10 10 <chr >
#> 11 11 <chr >
#> 12 12 <chr >
```

## Simulating the delirious ducks

I used the ‘micobenchmark’ library to first estimate how long the trials take to run. It seems they take 100 ms on average, though there is a lot of variation.

```library(microbenchmark)
microbenchmark(simulate_delirious_ducks(), times = 50)

#> Unit: milliseconds
#> expr min lq mean median uq max
#> simulate_delirious_ducks() 44.3163 70.25602 94.19177 79.80663 101.6753 273.984
#> neval
#> 50
```

I also created a simple helper function to run the simulation `n_trials` times and only return the length of each round.

```# Play some number of simulations of the delirious ducks.
play_delirious_ducks <- function(n_trials = 10, n_ducks = 2, max_iters = 1e3) {
map_dbl(seq(1, n_trials), ~ nrow(
simulate_delirious_ducks(n_ducks = n_ducks, max_iters = max_iters)
)
)
}
```

Finally, we can run the simulation and answer the Riddler. I used the ‘mustashe’ package to stash and load the results of the simulation so I don’t have to wait each time while writing and coding this Riddler solution.

```n_trials <- 1e3
stash("delirious_duck_sims2", {
delirious_duck_sims2 <- play_delirious_ducks(n_trials, n_ducks = 2)
})

```

The results for 1,000 trials are plotted as a histogram below with the mean, median, and 25% and 75% quantile are indicated by vertical lines.

```# Summary statistics.
mean_time <- mean(delirious_duck_sims2)
median_time <- median(delirious_duck_sims2)
q25_time <- quantile(delirious_duck_sims2, 0.25)
q75_time <- quantile(delirious_duck_sims2, 0.75)
summ_stats <- tibble(value = c(mean_time, median_time, q25_time, q75_time),
name = c("mean", "median", "25%", "75%"),
color = c(blue, red, grey, grey)) %>%
mutate(label = paste0(name, ": ", round(value, 1)))
tibble(sims = delirious_duck_sims2) %>%
ggplot(aes(sims)) +
geom_bar(color = "black", size = 0.7, fill = grey, alpha = 0.2) +
geom_vline(aes(xintercept = value, color = color), data = summ_stats,
size = 1, lty = 2) +
annotate("text", x = 11, y = seq(230, 170, length.out = 4),
label = summ_stats\$label, color = summ_stats\$color,
hjust = 0, fontface = "bold", family = "Arial") +
scale_x_continuous(expand = c(0, 0)) +
scale_y_continuous(expand = expansion(mult = c(0, 0.02))) +
scale_color_identity(guide = FALSE) +
labs(x = "time to reunion",
y = "count",
title = "Time for 2 ducks to reunite in a 3x3 pond")
``` Therefore, the answer to the Riddler is it would take, on average, the 2 ducks 4.84 minutes to reunite.

## More ducks

Since the simulation was parameterized, we can answer the Extra Credit question and run the process with varying numbers of ducks. The plot below shows the results for running the simulation with 3 ducks.

```n_trials <- 1e3
stash("delirious_duck_sims3", {
delirious_duck_sims3 <- play_delirious_ducks(n_trials, n_ducks = 3)
})

``` The density plots below show the simulation run with 2 through 6 ducks.

```stash("many_ducks_sims", {
many_ducks_sims <- tibble(n_ducks = c(2:6)) %>%
mutate(sims = map(n_ducks, ~ play_delirious_ducks(n_trials = 5e2,
n_ducks = .x,
max_iters = 3e3)))
})

many_ducks_sims %>%
unnest(sims) %>%
ggplot(aes(sims)) +
facet_wrap(~ n_ducks, scales = "free", nrow = 2) +
geom_density(aes(color = factor(n_ducks),
fill = factor(n_ducks)),
alpha = 0.2) +
scale_x_continuous(expand = c(0, 0)) +
scale_y_continuous(expand = expansion(mult = c(0, 0.02))) +
scale_color_brewer(palette = "Set1", guide = FALSE) +
scale_fill_brewer(palette = "Set1", guide = FALSE) +
labs(x = "time to reunion",
y = "density",
color = "num. ducks",
fill = "num. ducks",
title = "Time to reunite varying numbers of ducks")
``` Unsurprisingly, the distributions seem exponential, so taking the logarithm of the number of steps makes the peaks line up in succession.

```many_ducks_sims %>%
unnest(sims) %>%
ggplot(aes(log10(sims))) +
geom_density(aes(y = ..scaled..,
color = factor(n_ducks),
fill = factor(n_ducks)),
alpha = 0.2) +
scale_x_continuous(expand = c(0, 0)) +
scale_y_continuous(expand = expansion(mult = c(0, 0.02))) +
scale_color_brewer(palette = "Set1") +
scale_fill_brewer(palette = "Set1") +
theme(legend.position = "bottom") +
labs(x = "time to reunion (log10)",
y = "density (scaled)",
color = "num. ducks",
fill = "num. ducks",
title = "Time to reunite varying numbers of ducks")
``` There are some additional analyses that would be interesting to do (if I had more time) with the simulation results:

1. Model the Markov process of the simulation to derive an analytical solution.
2. Identify trends for where the ducks tend to reunite on the pond.