# Metro Systems Over Time: Part 2

**DataScience+**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Note, at the time of this writing using the packages `ggplot2`

and `ggmap`

from CRAN will result in an error. To avoid the error be sure to install both packages from GitHub with the package `devtools`

and restart R if the problem persists.

devtools::install_github("dkahle/ggmap") devtools::install_github("hadley/ggplot2")

## Introduction

In Part 1 of this series we collected geodata from Google of metro stops and plotted them on maps. In Part 2 we’ll be building Delaunay triangulations on top of those maps and computing the centroid of the network. This post will include some pretty advanced use of `tidyverse`

packages. For more information on some of these calls look at the `tidyverse`

documentation.

## Data

As a reminder, our data is the hand corrected values of the data we pulled down from Google. To see how we got the data go back to Part 1: Data.

## Maps with Delaunay Triangulations and Centroids

With our maps and data points in place let’s compute the Delaunay triangulation for each city. This will let us find the area a given city’s metro covers, and allows us to compute a center point, or centroid, for the metro system. We do this with the `deldir`

package. First though, I am going to use a function from `tidyr`

called `nest()`

which allows me to collapse a bunch of data into a single cell. By nesting by `city`

I get one row for each city and then the rest of the data for each column is a list of values in one cell. Additionally, I can collapse all of my other columns into a single column using `.key`

, in this case this new column is called `location_info`

. Think of it as a data frame tucked within a cell of a data frame. With my data nested I can make a new column called `deldir`

that will have all of the information from my `deldir()`

call. The `deldir()`

call simply takes two lists of continuous data points. It then computes several things, including the area of the shape and the edges of all the segments connecting the points. How do we access this information though? We can pull this information out with a `purrr`

call, `map()`

. The `map()`

call takes in some data and a function and applies the data to the function in an iterative fashion. For our purposes though we’re saying we want to take the data in the form of the column `deldir`

and pull out the `del.area`

. Thanks to the `mutate()`

call we can then save it to a new column. We can do the same thing with `delsgs`

(the segments of the shape) and `summary`

(more information about the individual triangles). See the fully nested data frame below.

library(purrr) library(deldir) data_deldir = data %>% nest(-city, .key = location_info) %>% mutate(deldir = map(location_info, function(df) deldir(df$lon, df$lat))) %>% mutate(del.area = map(deldir, "del.area")) %>% mutate(delsgs = map(deldir, "delsgs")) %>% mutate(summary = map(deldir, "summary")) data_deldir # A tibble: 4 × 6 city location_info deldir del.area delsgs summary <fctr> <list> <list> <list> <list> <list> 1 Paris <tibble [298 × 9]> <S3: deldir> <dbl [1]> <data.frame [849 × 6]> <data.frame [287 × 9]> 2 Berlin <tibble [173 × 9]> <S3: deldir> <dbl [1]> <data.frame [499 × 6]> <data.frame [171 × 9]> 3 Barcelona <tibble [149 × 9]> <S3: deldir> <dbl [1]> <data.frame [433 × 6]> <data.frame [148 × 9]> 4 Prague <tibble [58 × 9]> <S3: deldir> <dbl [1]> <data.frame [161 × 6]> <data.frame [58 × 9]>

Based on these areas it looks like the Berlin metro covers the most area at 0.059279 while Barcelona covers the smallest area at 0.016332. Now that we have our nested data frame with all pertinent information, we’re going to unnest the data necessary for our new plots. First we need the `delsgs`

data, which we use to draw the lines connecting the metro stops. To do this we’ll make a new data frame, dropping all columns except for `city`

and `delsgs`

. Then we `unnest()`

the data frame. This will expand the `delsgs`

column that had nested values, giving us many more rows and many more columns. The x1, y1, x2, and y1 values will be used later in our plot to draw the edges of our triangles. See part of the unnested data frame below.

data_deldir_delsgs = data_deldir %>% select(city, delsgs) %>% unnest() head(data_deldir_delsgs) # A tibble: 6 × 7 city x1 y1 x2 y2 ind1 ind2 <fctr> <dbl> <dbl> <dbl> <dbl> <int> <int> 1 Paris 2.366928 48.78793 2.359279 48.79272 283 282 2 Paris 2.433489 48.77262 2.366928 48.78793 72 283 3 Paris 2.450590 48.78984 2.433489 48.77262 74 72 4 Paris 2.450590 48.78984 2.459319 48.77978 74 73 5 Paris 2.455281 48.76805 2.433489 48.77262 198 72 6 Paris 2.455281 48.76805 2.459319 48.77978 198 73

In addition to the edges of the shape, we also want the centroid. To do this we’ll first make a new data frame focusing on just the city and summary information. We then `unnest()`

the data frame just as we did for the edges, however we don’t stop here. What we’re really interested in is the centroid, which we need to compute ourselves. To do this we’ll first `group_by()`

city. Then we’re going to `summarise()`

the data. To compute the x-value for the centroid, `cent_x`

, we’re going to take the `x`

column, which contains the x-coordinates of all of the points, and multiply each point by the `del.wts`

column, which contains the weights of the areas of each triangle. By adding these numbers together we get the x-value of the centroid of the entire figure. We can do the same thing for the y-value. See the table below for the data summarised to give us the centroids for each city.

data_deldir_cent = data_deldir %>% select(city, summary) %>% unnest() %>% group_by(city) %>% summarise(cent_x = sum(x * del.wts), cent_y = sum(y * del.wts)) %>% ungroup() data_deldir_cent # A tibble: 4 × 3 city cent_x cent_y <fctr> <dbl> <dbl> 1 Barcelona 2.137923 41.38708 2 Berlin 13.402654 52.51054 3 Paris 2.353365 48.85813 4 Prague 14.447439 50.07588

Now we can update our figures with the triangulations and centroids. I’ve again made a function to build the four maps. As before we start with `ggmap()`

and our city specific map object. Next we’ll use `geom_segment()`

to draw our edges. To do this we’ll use `x1`

, `y1`

, `x2`

, and `y2`

from our `data_deldir_delsgs`

data frame we made earlier. We then plot the actual metro stop points just as we did in our original map with `geom_point()`

. Finally we end with one more `geom_point()`

call, this time on our `data_deldir_cent`

data frame to plot the centroid specific to each city. See the four updated maps below. Again, I’ve left the code visible for the Paris map to see how the function works and hidden the rest.

del_plot = function(city_name, city_map){ ggmap(city_map, extent = "device") + geom_segment(data = subset(data_deldir_delsgs, city == city_name), aes(x = x1, y = y1, xend = x2, yend = y2), size = 1, color= "#92c5de") + geom_point(data = subset(data, city == city_name), aes(x = lon, y = lat), color = "#0571b0", size = 3) + geom_point(data = subset(data_deldir_cent, city == city_name), aes(x = cent_x, y = cent_y), size = 6, color= "#ca0020") } paris_del.plot = del_plot("Paris", paris_map) paris_del.plot

## Conclusion

In Part 2 of this series we computed Delaunay triangulations and centroids for each of our our city’s metro systems. This included some more complicated `tidyverse`

calls such as nesting and unnesting our data. In the third and final part of this series we’ll look at how the systems change over time and show it with a .gif.

Related Post

- Metro Systems Over Time: Part 1
- Outlier App: An Interactive Visualization of Outlier Algorithms
- Creating an animation using R
- The importance of Data Visualization
- ggplot2 themes examples

**leave a comment**for the author, please follow the link and comment on their blog:

**DataScience+**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.