Metro Systems Over Time: Part 2

[This article was first published on DataScience+, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Note, at the time of this writing using the packages ggplot2 and ggmap from CRAN will result in an error. To avoid the error be sure to install both packages from GitHub with the package devtools and restart R if the problem persists.

devtools::install_github("dkahle/ggmap")
devtools::install_github("hadley/ggplot2")

Introduction

In Part 1 of this series we collected geodata from Google of metro stops and plotted them on maps. In Part 2 we’ll be building Delaunay triangulations on top of those maps and computing the centroid of the network. This post will include some pretty advanced use of tidyverse packages. For more information on some of these calls look at the tidyverse documentation.

Data

As a reminder, our data is the hand corrected values of the data we pulled down from Google. To see how we got the data go back to Part 1: Data.

Maps with Delaunay Triangulations and Centroids

With our maps and data points in place let’s compute the Delaunay triangulation for each city. This will let us find the area a given city’s metro covers, and allows us to compute a center point, or centroid, for the metro system. We do this with the deldir package. First though, I am going to use a function from tidyr called nest() which allows me to collapse a bunch of data into a single cell. By nesting by city I get one row for each city and then the rest of the data for each column is a list of values in one cell. Additionally, I can collapse all of my other columns into a single column using .key, in this case this new column is called location_info. Think of it as a data frame tucked within a cell of a data frame. With my data nested I can make a new column called deldir that will have all of the information from my deldir() call. The deldir() call simply takes two lists of continuous data points. It then computes several things, including the area of the shape and the edges of all the segments connecting the points. How do we access this information though? We can pull this information out with a purrr call, map(). The map() call takes in some data and a function and applies the data to the function in an iterative fashion. For our purposes though we’re saying we want to take the data in the form of the column deldir and pull out the del.area. Thanks to the mutate() call we can then save it to a new column. We can do the same thing with delsgs (the segments of the shape) and summary (more information about the individual triangles). See the fully nested data frame below.

library(purrr)
library(deldir)

data_deldir = data %>%
  nest(-city, .key = location_info) %>%
  mutate(deldir = map(location_info, function(df) deldir(df$lon, df$lat))) %>%
  mutate(del.area = map(deldir, "del.area")) %>%
  mutate(delsgs = map(deldir, "delsgs")) %>%
  mutate(summary = map(deldir, "summary"))
data_deldir
# A tibble: 4 × 6
       city      location_info       deldir  del.area                 delsgs                summary
     <fctr>             <list>       <list>    <list>                 <list>                 <list>
1     Paris <tibble [298 × 9]> <S3: deldir> <dbl [1]> <data.frame [849 × 6]> <data.frame [287 × 9]>
2    Berlin <tibble [173 × 9]> <S3: deldir> <dbl [1]> <data.frame [499 × 6]> <data.frame [171 × 9]>
3 Barcelona <tibble [149 × 9]> <S3: deldir> <dbl [1]> <data.frame [433 × 6]> <data.frame [148 × 9]>
4    Prague  <tibble [58 × 9]> <S3: deldir> <dbl [1]> <data.frame [161 × 6]>  <data.frame [58 × 9]>

Based on these areas it looks like the Berlin metro covers the most area at 0.059279 while Barcelona covers the smallest area at 0.016332. Now that we have our nested data frame with all pertinent information, we’re going to unnest the data necessary for our new plots. First we need the delsgs data, which we use to draw the lines connecting the metro stops. To do this we’ll make a new data frame, dropping all columns except for city and delsgs. Then we unnest() the data frame. This will expand the delsgs column that had nested values, giving us many more rows and many more columns. The x1, y1, x2, and y1 values will be used later in our plot to draw the edges of our triangles. See part of the unnested data frame below.

data_deldir_delsgs = data_deldir %>%
  select(city, delsgs) %>%
  unnest()
head(data_deldir_delsgs)
# A tibble: 6 × 7
    city       x1       y1       x2       y2  ind1  ind2
  <fctr>    <dbl>    <dbl>    <dbl>    <dbl> <int> <int>
1  Paris 2.366928 48.78793 2.359279 48.79272   283   282
2  Paris 2.433489 48.77262 2.366928 48.78793    72   283
3  Paris 2.450590 48.78984 2.433489 48.77262    74    72
4  Paris 2.450590 48.78984 2.459319 48.77978    74    73
5  Paris 2.455281 48.76805 2.433489 48.77262   198    72
6  Paris 2.455281 48.76805 2.459319 48.77978   198    73

In addition to the edges of the shape, we also want the centroid. To do this we’ll first make a new data frame focusing on just the city and summary information. We then unnest() the data frame just as we did for the edges, however we don’t stop here. What we’re really interested in is the centroid, which we need to compute ourselves. To do this we’ll first group_by() city. Then we’re going to summarise() the data. To compute the x-value for the centroid, cent_x, we’re going to take the x column, which contains the x-coordinates of all of the points, and multiply each point by the del.wts column, which contains the weights of the areas of each triangle. By adding these numbers together we get the x-value of the centroid of the entire figure. We can do the same thing for the y-value. See the table below for the data summarised to give us the centroids for each city.

data_deldir_cent = data_deldir %>%
  select(city, summary) %>%
  unnest() %>%
  group_by(city) %>%
  summarise(cent_x = sum(x * del.wts),
            cent_y = sum(y * del.wts)) %>%
  ungroup()
data_deldir_cent
# A tibble: 4 × 3
       city    cent_x   cent_y
     <fctr>     <dbl>    <dbl>
1 Barcelona  2.137923 41.38708
2    Berlin 13.402654 52.51054
3     Paris  2.353365 48.85813
4    Prague 14.447439 50.07588

Now we can update our figures with the triangulations and centroids. I’ve again made a function to build the four maps. As before we start with ggmap() and our city specific map object. Next we’ll use geom_segment() to draw our edges. To do this we’ll use x1, y1, x2, and y2 from our data_deldir_delsgs data frame we made earlier. We then plot the actual metro stop points just as we did in our original map with geom_point(). Finally we end with one more geom_point() call, this time on our data_deldir_cent data frame to plot the centroid specific to each city. See the four updated maps below. Again, I’ve left the code visible for the Paris map to see how the function works and hidden the rest.

del_plot = function(city_name, city_map){
  ggmap(city_map, extent = "device") +
    geom_segment(data = subset(data_deldir_delsgs, city == city_name), aes(x = x1, y = y1, xend = x2, yend = y2),
                 size = 1, color= "#92c5de") +
    geom_point(data = subset(data, city == city_name), aes(x = lon, y = lat),
               color = "#0571b0", size = 3) +
    geom_point(data = subset(data_deldir_cent, city == city_name),
               aes(x = cent_x, y = cent_y),
               size = 6, color= "#ca0020")
}

paris_del.plot = del_plot("Paris", paris_map)
paris_del.plot

Plot for Paris:

Conclusion

In Part 2 of this series we computed Delaunay triangulations and centroids for each of our our city’s metro systems. This included some more complicated tidyverse calls such as nesting and unnesting our data. In the third and final part of this series we’ll look at how the systems change over time and show it with a .gif.

    Related Post

    1. Metro Systems Over Time: Part 1
    2. Outlier App: An Interactive Visualization of Outlier Algorithms
    3. Creating an animation using R
    4. The importance of Data Visualization
    5. ggplot2 themes examples

    To leave a comment for the author, please follow the link and comment on their blog: DataScience+.

    R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
    Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

    Never miss an update!
    Subscribe to R-bloggers to receive
    e-mails with the latest R posts.
    (You will not see this message again.)

    Click here to close (This popup will not appear again)