Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

## FiveThirtyEight’s Riddler Express

Riddler City is a large circular metropolis, with countless square
city blocks that each have a side length of 1 km. A small section of
the city, composed of 36 blocks, is shown in the diagram below:

At the very center of the city lies Riddler City Hall. Its many
employees all walk to and from work, and their homes are evenly
scattered across the city. The sidewalks they walk along have always
been adjacent to the streets — but that may be changing.

Recently, several city hall employees submitted a petition, requesting
that the sidewalks should no longer lie alongside the streets.
Instead, they want the sidewalks to cut diagonally across the city,
connecting nearby street intersections. These proposed sidewalks are
represented by the thicker blue lines in the diagram below:

The mayor of Riddler City has tasked you with resolving this dispute
in a mathematical manner. She would like you to answer the following
question: What fraction of the city hall employees would have a
shorter walk home (that is, to the street intersection nearest to
their home) if the city replaced its traditional sidewalks with these
diagonal sidewalks?

## Plan

My plan is to build out each system as a graph and then measure the
shortest distance to the center from each node.

## Setup

```knitr::opts_chunk\$set(echo = TRUE, comment = "#>")
library(glue)
library(ggraph)
library(tidygraph)
library(tidyverse)
theme_set(theme_minimal())
set.seed(0)
```

## Building the graphs

I plan to do a small-scale example and then get the final answer with a
larger city. To help with this, I parameterized the number of blocks in
the city as the variable `N_BLOCKS` derived from the radius of the city
`CITY_RADIUS`. The number of rows and columns were derived from this.

```CITY_RADIUS <- 3
N_BLOCKS <- CITY_RADIUS * 2 + 2
num_rows <- N_BLOCKS + 1
num_cols <- num_rows
```

I built a simple function `node_name()` to help keep the naming of the
nodes standardized as the column (x) and row (y) of the node. It is
first used to locate the city hall.

```node_name <- function(column_idx, row_idx) {
paste(column_idx, row_idx, sep = ",")
}
TOWN_HALL_POS <- list(col = 1 + floor(0.5 * N_BLOCKS),
row = 1 + floor(0.5 * N_BLOCKS))
TOWN_HALL <- node_name(TOWN_HALL_POS\$col, TOWN_HALL_POS\$row)
TOWN_HALL

#> [1] "5,5"
```

I also made a the function `is_in_circle()` that checks if the point
made by `col` and `row` are in a circle defined by `center` and
`radius`. This is used below to trim down a rectangular grid to a circle
for the city.

```is_in_circle <- function(center, col, row, radius) {
sqrt((col - center\$col)^2 + (row - center\$row)^2) <= radius
}
is_in_circle <- memoise::memoise(is_in_circle)
```

### The original sidewalk system

I built the first graph of the original layout of the city in two
pieces, first stringing all of the columns of the grid together, and
then adding edges for the rows. I plot the intermediate graphs to make
this more clear. At each step, the two nodes are checked to make sure
they are both within the circle of the city.

The first set of for-loops below iterate through the rows for each
column, adding an edge to the edgelist `el`.

```el <- tibble()
for (col_idx in seq(1, num_cols)) {
for (row_idx in seq(1, num_rows - 1)) {
check_1 <- is_in_circle(TOWN_HALL_POS, col_idx, row_idx, CITY_RADIUS)
check_2 <- is_in_circle(TOWN_HALL_POS, col_idx, row_idx+1, CITY_RADIUS)
if (check_1 & check_2) {
node_a <- node_name(col_idx, row_idx)
node_b <-node_name(col_idx, row_idx + 1)
el <- bind_rows(el, tibble(from = node_a, to = node_b))
}
}
}
```

Below shows the intermediate graph where only the vertical connections

The same process is followed to add the horizontal edges of the grid to
the edge list.

```for (row_idx in seq(1, num_rows)) {
for (col_idx in seq(1, num_cols - 1)) {
if (is_in_circle(TOWN_HALL_POS, col_idx, row_idx, CITY_RADIUS) &
is_in_circle(TOWN_HALL_POS, col_idx + 1, row_idx, CITY_RADIUS)) {
node_a <- node_name(col_idx, row_idx)
node_b <-node_name(col_idx + 1, row_idx)
el <- bind_rows(el, tibble(from = node_a, to = node_b))
}
}
}
```

With the horizontal connections made, the `tidygraph` object can be
created from the edge list.

```city_graph <- as_tbl_graph(el, directed = FALSE)
```

### The diagonal sidewalk system

Building the graph for the diagonal layout was a bit different. I’m sure
there is a more efficient method, but I decided to use a very verbose
one. It uses two nested for-loops to iterate over each integer value in
the grid. For each of these nodes, connections to nodes in the middle of
the block were made. Therefore, there are now nodes with 0.5 positions
on the grid. The many if-statements are there to make sure connections
are not made beyond the limits of the grid (though I don’t think they
would matter to the final calculation because they would not create
faster routes to the middle).

```el <- tibble()
add_to_el <- function(from, to_col, to_row) {
el <<- bind_rows(el, tibble(from = node_a, to = node_name(to_col, to_row)))
}
for (col_idx in seq(1, num_cols)) {
for (row_idx in seq(1, num_rows)) {
node_a <- node_name(col_idx, row_idx)
if (!is_in_circle(TOWN_HALL_POS, col_idx, row_idx, CITY_RADIUS)) {
next
}
if (col_idx < num_cols & row_idx < num_rows) {
add_to_el(node_a, col_idx + 0.5, row_idx + 0.5)
}
if (col_idx > 1 & row_idx < num_rows) {
add_to_el(node_a, col_idx - 0.5, row_idx + 0.5)
}
if (row_idx > 1 & col_idx < num_cols) {
add_to_el(node_a, col_idx + 0.5, row_idx - 0.5)
}
if (row_idx > 1 & col_idx > 1) {
add_to_el(node_a, col_idx - 0.5, row_idx - 0.5)
}
}
}
diag_graph <- as_tbl_graph(el, directed = FALSE)
```

The grid looks similar to the first, but we can also see the new 0.5
nodes.

## Measure distance to city hall

Finally, we can measure the distance from each node to the city hall at
the center. Note that I do adhere to the Riddler because it asks for the
length of the path to the nearest street intersection, which is a node
(with an integer values for the name) in the two graphs that have been
built.

The distance of the shortest path was measured from every node to the
center node using the `distance()` function from the
‘igraph’ package.

```original_city_distances <- igraph::distances(city_graph, to = TOWN_HALL)
diagonal_city_distances <- igraph::distances(diag_graph, to = TOWN_HALL)
```

These values had to be slightly modified because the length of an edge
in the original graph is longer than the edge in the graph with diagonal
sidewalks. Each edge in the graph of diagonal sidewalks is half of the
hypotenuse. This value is calculated below in units of the original
sidewalk length and multiplied against the lengths of the shortest
paths.

```original_sidewalk_unit <- 1
diagonal_sidewalk_unit <- 0.5 * sqrt(original_sidewalk_unit^2 + original_sidewalk_unit^2)
original_city_distances <- original_city_distances * original_sidewalk_unit
diagonal_city_distances <- diagonal_city_distances * diagonal_sidewalk_unit
```

Below are the summaries of these distances. It seems like, on average,
the walking distances were shorter with original sidewalks, but only
slightly.

```summary(original_city_distances)

#> 5,5
#> Min. :0.000
#> 1st Qu.:2.000
#> Median :3.000
#> Mean :2.483
#> 3rd Qu.:3.000
#> Max. :4.000

summary(diagonal_city_distances)

#> 5,5
#> Min. :0.000
#> 1st Qu.:2.121
#> Median :2.828
#> Mean :2.906
#> 3rd Qu.:3.536
#> Max. :4.950
```

### Visualizing the results

Below are several ways of visualizing the results.

The first code chunk below “tidy’s” the data by creating one tall data
frame with the data from both graphs. (I don’t show the code for the
plots as they are relatively standard ‘ggplots’, but the code is
available in the R Markdown notebook linked at the top of the page.)

```tidy_distance_matrix <- function(mat, name) {
mat %>%
as.data.frame() %>%
set_names(c("dist_to_cityhall")) %>%
rownames_to_column("from") %>%
as_tibble() %>%
}
walking_dists <- bind_rows(
tidy_distance_matrix(original_city_distances, "original"),
tidy_distance_matrix(diagonal_city_distances, "diagonal")
) %>%
filter(
city == "original" | (city == "diagonal" & !str_detect(from, "\\."))
) %>%
filter(from != !!TOWN_HALL)

#> # A tibble: 6 x 3
#> from dist_to_cityhall city
#>
#> 1 3,3 4 original
#> 2 3,4 3 original
#> 3 3,5 2 original
#> 4 3,6 3 original
#> 5 4,3 3 original
#> 6 4,4 2 original
```

The plot below just shows a histogram of the walking distances.

We can also calculate the difference in the shortest distance from each
node on each graph. I subtracted the shortest distances on the original
sidewalks from those on the diagonal sidewalks, so a negative number
indicates the node’s shortest distance to the center decreased on the
diagonal sidewalks.

```diff_walking_dists <- walking_dists %>%
pivot_wider(from, names_from = city, values_from = dist_to_cityhall) %>%
mutate(dist_diff = diagonal - original)
summary(diff_walking_dists\$dist_diff)

#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> -1.17157 -0.58579 -0.17157 0.05497 0.82843 1.24264
```

The plot below shows both shortest distance values on the x and y-axis.
The line indicates where the distances are the same. The points that lie
below the line (blue) have a shorter walk with the diagonal sidewalks.

Below is a histogram of the differences in shortest path lengths for
each node. By eye, it looks like most data points lie below 0 on the
x-axis.

Finally, we can plot which node has a shorter or longer distance with
the diagonal sidewalks overlayed on the original grid system.

## What about with a larger city?

The Riddler states that the city is very large, so I increased the
radius for the simulation and re-ran the same code and generated the
same plots. In this simulation, city hall is on node (42, 42).

```CITY_RADIUS <- 40
N_BLOCKS <- CITY_RADIUS * 2 + 2
num_rows <- N_BLOCKS + 1
num_cols <- num_rows
TOWN_HALL_POS <- list(col = 1 + floor(0.5 * N_BLOCKS),
row = 1 + floor(0.5 * N_BLOCKS))
TOWN_HALL <- node_name(TOWN_HALL_POS\$col, TOWN_HALL_POS\$row)
TOWN_HALL

#> [1] "42,42"
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> -16.40202 -4.96573 -0.04416 -0.04943 4.87006 16.56854
```

The density plot below shows that the distributions of walking distances
are pretty much equivalent between the two sidewalk layouts.

Taking the difference of the path lengths for each person backs up this
initial observation. It looks like half of the nodes fall above the 1:1
line, and half fall below.

Plotting a histogram of the differences in distance between the two
layouts presents the 50:50 split quite clearly.

Finally, laying out the difference between the distances from each node
on their original geographic location shows quite clearly that half of
the nodes have a shorter distance to city hall, while the walk for the
other half gets longer.

As a reminder, the original question was:

What fraction of the city hall employees would have a shorter walk
home (that is, to the street intersection nearest to their home) if
the city replaced its traditional sidewalks with these diagonal
sidewalks?

```mean(diff_walking_dists\$dist_diff < 0)

#> [1] 0.5015924
```

About 50% of employees would now have a shorter walk to work. This
makes sense when we think about the use of the diagonal sidewalks as
just a rotation of the current grid system. The net total would be no
change in the average distance walked to city hall.

## Acknowledgements

The graphs were made and analyzed with
igraph and
tidygraph. The graphs were
plotted with
ggraph. Repetitive
tasks were sped up using the
memoise
package for memoization.