Spatial networks in R with sf and tidygraph

[This article was first published on r-spatial, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Spatial networks in R with sf and tidygraph

Lucas van der Meer, Robin Lovelace & Lorena Abad
September 26, 2019

Introduction

Street networks, shipping routes, telecommunication lines, river
bassins. All examples of spatial networks: organized systems of nodes
and edges embedded in space. For most of them, these nodes and edges can
be associated with geographical coordinates. That is, the nodes are
geographical points, and the edges geographical lines.

Such spatial networks can be analyzed using graph theory. Not for
nothing, Leonhard Eulers famous work on the Seven Bridges of
Köningsberg
,
which laid the foundations of graph theory and network analysis, was in
essence a spatial problem.

In R, there are advanced, modern tools for both the analysis of spatial
data and networks. Furthermore, several packages have been developed
that cover (parts of) spatial network analysis. As stated in this
github issue and this
tweet,
concensus on how best to represent and analyse spatial networks has
proved elusive.

This blogpost demonstrates an approach to spatial networks that starts
with a set of geographic lines, and leads to an object ready to be used
for network analysis. Along the way, we will see that there are still
steps that need to be taken before the process of analyzing spatial
networks in R is user friendly and efficient.

Existing R packages for spatial networks

Although R was originally designed as a language for statistical
computing, an active ‘R-spatial’ ecosystem has evolved. Powerful and
high performance packages for spatial data analysis have been developed,
thanks largely to interfaces to mature C/C++ libraries such as GDAL,
GEOS and PROJ, notably in the package
sf (see
section 1.5
of Geocomputation with R for a brief history). Likewise, a number of
packages for graph representation and analysis have been developed,
notably tidygraph, which is
based on igraph.

Both sf and tidygraph support the tibble class and the broader ‘tidy’
approach to data science, which involves data processing pipelines, type
stability and a convention of representing everything as a data frame
(well a tibble, which is a data frame with user friendly default
settings). In sf, this means storing spatial vector data as objects of
class sf, which are essentially the same as a regular data frame (or
tibble), but with an additional ‘sticky’ list column containing a
geometry for each feature (row), and attributes such as bounding box and
CRS. Tidygraph stores networks in objects of class tbl_graph. A
tbl_graph is an igraph object, but enables the user to manipulate
both the edges and nodes elements as if they were data frames also.

Both sf and tidygraph are relatively new packages (first released on
CRAN in 2016 and 2017, respectively). It is unsurprising, therefore,
that they have yet to be combined to allow a hybrid, tibble-based
representation of spatial networks.

Nevertheless, a number of approaches have been developed for
representing spatial networks, and some of these are in packages that
have been published on CRAN.
stplanr, for instance, contains
the SpatialLinesNetwork class, which works with both the
sp (a package for spatial data analysis
launched in 2005) and sf packages.
dodgr is a more recent package
that provides analytical tools for street networks, with a focus on
directed graphs (that can have direction-dependent weights,
e.g. representing a one-way street). Other packages seeking to
implement spatial networks in R include
spnetwork, a package that defined
a class system combining sp and igraph, and
shp2graph,
which provides tools to switch between sp and igraph objects.

Each package has its merits that deserve to be explored in more detail
(possibly in a subsequent blog post). The remainder of this post
outlines an approach that combines sf and igraph objects in a
tidygraph object.

Set-up

The following code chunk will install the packages used in this post:

# We'll use remotes to install packages, install it if needs be:
if(!"remotes" %in% installed.packages()) {
  install.packages("remotes")
}

cran_pkgs = c(
  "sf",
  "tidygraph",
  "igraph",
  "osmdata",
  "dplyr",
  "tibble",
  "ggplot2",
  "units",
  "tmap",
  "rgrass7",
  "link2GI",
  "nabor"
)

remotes::install_cran(cran_pkgs)

library(sf)
library(tidygraph)
library(igraph)
library(dplyr)
library(tibble)
library(ggplot2)
library(units)
library(tmap)
library(osmdata)
library(rgrass7)
library(link2GI)
library(nabor)

Getting the data

As an example, we use the street network of the city center of Münster,
Germany. We will get the data from OpenStreetMap. Packages like dodgr
have optimized their code for such data, however considering that we
want to showcase this workflow for any source of data, we will generate
an object of class sf containing only LINESTRING geometries.
However, streets that form loops, are returned by osmdata as polygons,
rather than lines. These, we will convert to lines, using the
osm_poly2line function. One additional variable, the type of street,
is added to show that the same steps can be used for sf objects that
contain any number of additional variables.

muenster <- opq(bbox =  c(7.61, 51.954, 7.636, 51.968)) %>% 
  add_osm_feature(key = 'highway') %>% 
  osmdata_sf() %>% 
  osm_poly2line()

muenster_center <- muenster$osm_lines %>% 
  select(highway)

muenster_center

## Simple feature collection with 2233 features and 1 field
## geometry type:  LINESTRING
## dimension:      XY
## bbox:           xmin: 7.603762 ymin: 51.94823 xmax: 7.641418 ymax: 51.97241
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
## First 10 features:
##             highway                       geometry
## 4064462     primary LINESTRING (7.624554 51.955...
## 4064463     primary LINESTRING (7.626498 51.956...
## 4064467 residential LINESTRING (7.630898 51.955...
## 4064474     primary LINESTRING (7.61972 51.9554...
## 4064476     primary LINESTRING (7.619844 51.954...
## 4064482    tertiary LINESTRING (7.616395 51.957...
## 4064485     service LINESTRING (7.63275 51.9603...
## 4984982   secondary LINESTRING (7.614156 51.967...
## 4985138    cycleway LINESTRING (7.61525 51.9673...
## 4985140 residential LINESTRING (7.616774 51.968...

ggplot(data = muenster_center) + geom_sf()

From sf to tbl_graph: a step wise approach

Step 1: Clean the network

To perform network analysis, we need a network with a clean topology. In
theory, the best way to clean up the network topology is by manual
editing, but this can be very labour intensive and time consuming,
mainly for large networks. The
v.clean toolset
from the GRASS GIS software provides automated functionalities for this
task, and is therefore a popular instrument within the field of spatial
network analysis. As far as we know, there is no R equivalent for this
toolset, but fortunately, the
rgrass7
and link2GI packages enable us
to easily ‘bridge’ to GRASS GIS. Obviously, this requires to have GRASS
GIS installed on your computer. For an in depth description of combining
R with open source GIS software, see
Chapter 9 of
Geocomputation with R. Take into account that the linking process may
take up some time, especially on Windows operating systems. Also, note
that there have been some large changes recently to the rgrass7
package, which enabled a better integration with sf. However, it also
means that the code below will not work when using an older version of
rgrass7, so make sure to update if needed.

Here, we will clean the network topology by breaking lines at
intersections and also breaking lines that form a collapsed loop. This
will be followed by a removal of duplicated geometry features. Once
done, we will read the data back into R, and convert again into an sf
object with LINESTRING geometry. For this we use rgrass7, which
requires some set-up code that can be seen in the source
code

of this post.

# Add data to GRASS spatial database  
writeVECT(
  SDF = muenster_center, 
  vname = 'muenster_center', 
  v.in.ogr_flags = 'overwrite'
)

# Execute the v.clean tool
execGRASS("g.proj", flags = c("c", "quiet"), proj4 = proj4)
execGRASS(
  cmd = 'v.clean', 
  input = 'muenster_center', 
  output = 'muenster_cleaned',        
  tool = 'break', 
  flags = c('overwrite', 'c')
)

# Read back into R
use_sf()
muenster_center <- readVECT('muenster_cleaned') %>%
  rename(geometry = geom) %>%
  select(-cat)

muenster_center

## Simple feature collection with 4667 features and 1 field
## geometry type:  LINESTRING
## dimension:      XY
## bbox:           xmin: 7.603762 ymin: 51.94823 xmax: 7.641418 ymax: 51.97241
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
## First 10 features:
##        highway                       geometry
## 1      service LINESTRING (7.63275 51.9603...
## 2    secondary LINESTRING (7.614156 51.967...
## 3     cycleway LINESTRING (7.61525 51.9673...
## 4      footway LINESTRING (7.629304 51.967...
## 5        steps LINESTRING (7.627696 51.965...
## 6      service LINESTRING (7.633612 51.965...
## 7  residential LINESTRING (7.630564 51.957...
## 8      service LINESTRING (7.613545 51.960...
## 9     cycleway LINESTRING (7.619781 51.957...
## 10 residential LINESTRING (7.62373 51.9643...

Step 2: Give each edge a unique index

The edges of the network, are simply the linestrings in the data. Each
of them gets a unique index, which can later be related to their start
and end node.

edges <- muenster_center %>%
  mutate(edgeID = c(1:n()))

edges

## Simple feature collection with 4667 features and 2 fields
## geometry type:  LINESTRING
## dimension:      XY
## bbox:           xmin: 7.603762 ymin: 51.94823 xmax: 7.641418 ymax: 51.97241
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
## First 10 features:
##        highway                       geometry edgeID
## 1      service LINESTRING (7.63275 51.9603...      1
## 2    secondary LINESTRING (7.614156 51.967...      2
## 3     cycleway LINESTRING (7.61525 51.9673...      3
## 4      footway LINESTRING (7.629304 51.967...      4
## 5        steps LINESTRING (7.627696 51.965...      5
## 6      service LINESTRING (7.633612 51.965...      6
## 7  residential LINESTRING (7.630564 51.957...      7
## 8      service LINESTRING (7.613545 51.960...      8
## 9     cycleway LINESTRING (7.619781 51.957...      9
## 10 residential LINESTRING (7.62373 51.9643...     10

Step 3: Create nodes at the start and end point of each edge

The nodes of the network, are the start and end points of the edges. The
locations of these points can be derived by using the st_coordinates
function in sf. When given a set of linestrings, this function breaks
down each of them into the points they are built up from. It returns a
matrix with the X and Y coordinates of those points, and additionally an
integer indicator L1 specifying to which line a point belongs. These
integer indicators correspond to the edge indices defined in step 1.
That is, if we convert the matrix into a data.frame or tibble, group
the features by the edge index, and only keep the first and last feature
of each group, we have the start and end points of the linestrings.

nodes <- edges %>%
  st_coordinates() %>%
  as_tibble() %>%
  rename(edgeID = L1) %>%
  group_by(edgeID) %>%
  slice(c(1, n())) %>%
  ungroup() %>%
  mutate(start_end = rep(c('start', 'end'), times = n()/2))

nodes

## # A tibble: 9,334 x 4
##        X     Y edgeID start_end
##            
##  1  7.63  52.0      1 start    
##  2  7.63  52.0      1 end      
##  3  7.61  52.0      2 start    
##  4  7.61  52.0      2 end      
##  5  7.62  52.0      3 start    
##  6  7.62  52.0      3 end      
##  7  7.63  52.0      4 start    
##  8  7.63  52.0      4 end      
##  9  7.63  52.0      5 start    
## 10  7.63  52.0      5 end      
## # … with 9,324 more rows

Step 4: Give each node a unique index

Each of the nodes in the network needs to get a unique index, such that
they can be related to the edges. However, we need to take into account
that edges can share either startpoints and/or endpoints. Such
duplicated points, that have the same X and Y coordinate, are one single
node, and should therefore get the same index. Note that the coordinate
values as displayed in the tibble are rounded, and may look the same for
several rows, even when they are not. We can use the group_indices
function in dplyr to give each group of unique X,Y-combinations a unique
index.

nodes <- nodes %>%
  mutate(xy = paste(.$X, .$Y)) %>% 
  mutate(nodeID = group_indices(., factor(xy, levels = unique(xy)))) %>%
  select(-xy)

nodes

## # A tibble: 9,334 x 5
##        X     Y edgeID start_end nodeID
##              
##  1  7.63  52.0      1 start          1
##  2  7.63  52.0      1 end            2
##  3  7.61  52.0      2 start          3
##  4  7.61  52.0      2 end            4
##  5  7.62  52.0      3 start          5
##  6  7.62  52.0      3 end            6
##  7  7.63  52.0      4 start          7
##  8  7.63  52.0      4 end            8
##  9  7.63  52.0      5 start          9
## 10  7.63  52.0      5 end           10
## # … with 9,324 more rows

Step 5: Combine the node indices with the edges

Now each of the start and endpoints have been assigned a node ID in step
4, so that we can add the node indices to the edges. In other words, we
can specify for each edge, in which node it starts, and in which node it
ends.

source_nodes <- nodes %>%
  filter(start_end == 'start') %>%
  pull(nodeID)

target_nodes <- nodes %>%
  filter(start_end == 'end') %>%
  pull(nodeID)

edges = edges %>%
  mutate(from = source_nodes, to = target_nodes)

edges

## Simple feature collection with 4667 features and 4 fields
## geometry type:  LINESTRING
## dimension:      XY
## bbox:           xmin: 7.603762 ymin: 51.94823 xmax: 7.641418 ymax: 51.97241
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
## First 10 features:
##        highway                       geometry edgeID from to
## 1      service LINESTRING (7.63275 51.9603...      1    1  2
## 2    secondary LINESTRING (7.614156 51.967...      2    3  4
## 3     cycleway LINESTRING (7.61525 51.9673...      3    5  6
## 4      footway LINESTRING (7.629304 51.967...      4    7  8
## 5        steps LINESTRING (7.627696 51.965...      5    9 10
## 6      service LINESTRING (7.633612 51.965...      6   11 12
## 7  residential LINESTRING (7.630564 51.957...      7   13 14
## 8      service LINESTRING (7.613545 51.960...      8   15 16
## 9     cycleway LINESTRING (7.619781 51.957...      9   17 18
## 10 residential LINESTRING (7.62373 51.9643...     10   19 20

Step 6: Remove duplicate nodes

Having added the unique node ID’s to the edges data, we don’t need the
duplicated start and endpoints anymore. After removing them, we end up
with a tibble in which each row represents a unique, single node. This
tibble can be converted into an sf object, with POINT geometries.

nodes <- nodes %>%
  distinct(nodeID, .keep_all = TRUE) %>%
  select(-c(edgeID, start_end)) %>%
  st_as_sf(coords = c('X', 'Y')) %>%
  st_set_crs(st_crs(edges))

nodes

## Simple feature collection with 3329 features and 1 field
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 7.603762 ymin: 51.94823 xmax: 7.641418 ymax: 51.97241
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
## # A tibble: 3,329 x 2
##    nodeID            geometry
##              
##  1      1   (7.63275 51.9603)
##  2      2 (7.631843 51.96061)
##  3      3 (7.614156 51.96724)
##  4      4 (7.613797 51.96723)
##  5      5  (7.61525 51.96736)
##  6      6 (7.615148 51.96745)
##  7      7 (7.629304 51.96712)
##  8      8  (7.629308 51.9673)
##  9      9 (7.627696 51.96534)
## 10     10  (7.62765 51.96534)
## # … with 3,319 more rows

Step 7: Convert to tbl_graph

The first six steps led to one sf object with LINESTRING geometries,
representing the edges of the network, and one sf object with POINT
geometries, representing the nodes of the network. The tbl_graph
function allows us to convert these two into a tbl_graph object. There
are two tricky parts in this step that need to be highlighted. One, is
that the columns containing the indices of the source and target nodes
should either be the first two columns of the sf object, or be named
‘to’ and ‘from’, respectively. Secondly, inside the tbl_graph
function, these columns are converted into a two-column matrix. However,
an sf object has a so-called ‘sticky geometry’, which means that the
geometry column sticks to the attributes whenever specific columns are
selected. Therefore, the matrix created inside tbl_graph has three
columns instead of two, and that causes an error. Therefore, we first
need to convert the sf object to a regular data.frame or tibble,
before we can construct a tbl_graph. In the end, this doesn’t matter,
since both the nodes and edges will be ‘integrated’ into an igraph
structure, and loose their specific sf
characteristics.

graph = tbl_graph(nodes = nodes, edges = as_tibble(edges), directed = FALSE)

graph

## # A tbl_graph: 3329 nodes and 4667 edges
## #
## # An undirected multigraph with 34 components
## #
## # Node Data: 3,329 x 2 (active)
##   nodeID            geometry
##             
## 1      1   (7.63275 51.9603)
## 2      2 (7.631843 51.96061)
## 3      3 (7.614156 51.96724)
## 4      4 (7.613797 51.96723)
## 5      5  (7.61525 51.96736)
## 6      6 (7.615148 51.96745)
## # … with 3,323 more rows
## #
## # Edge Data: 4,667 x 5
##    from    to highway                                       geometry edgeID
##                                        
## 1     1     2 service           (7.63275 51.9603, 7.631843 51.96061)      1
## 2     3     4 secondary       (7.614156 51.96724, 7.613797 51.96723)      2
## 3     5     6 cycleway  (7.61525 51.96736, 7.615165 51.96744, 7.615…      3
## # … with 4,664 more rows

Step 8: Putting it together

To make the approach more convenient, we can combine all steps above
into a single function, that takes a cleaned sf object with
LINESTRING geometries as input, and returns a spatial tbl_graph.

sf_to_tidygraph = function(x, directed = TRUE) {
  
  edges <- x %>%
    mutate(edgeID = c(1:n()))
  
  nodes <- edges %>%
    st_coordinates() %>%
    as_tibble() %>%
    rename(edgeID = L1) %>%
    group_by(edgeID) %>%
    slice(c(1, n())) %>%
    ungroup() %>%
    mutate(start_end = rep(c('start', 'end'), times = n()/2)) %>%
    mutate(xy = paste(.$X, .$Y)) %>% 
    mutate(nodeID = group_indices(., factor(xy, levels = unique(xy)))) %>%
    select(-xy)
  
  source_nodes <- nodes %>%
    filter(start_end == 'start') %>%
    pull(nodeID)

  target_nodes <- nodes %>%
    filter(start_end == 'end') %>%
    pull(nodeID)

  edges = edges %>%
    mutate(from = source_nodes, to = target_nodes)
  
  nodes <- nodes %>%
    distinct(nodeID, .keep_all = TRUE) %>%
    select(-c(edgeID, start_end)) %>%
    st_as_sf(coords = c('X', 'Y')) %>%
    st_set_crs(st_crs(edges))
  
  tbl_graph(nodes = nodes, edges = as_tibble(edges), directed = directed)
  
}

sf_to_tidygraph(muenster_center, directed = FALSE)

## # A tbl_graph: 3329 nodes and 4667 edges
## #
## # An undirected multigraph with 34 components
## #
## # Node Data: 3,329 x 2 (active)
##   nodeID            geometry
##             
## 1      1   (7.63275 51.9603)
## 2      2 (7.631843 51.96061)
## 3      3 (7.614156 51.96724)
## 4      4 (7.613797 51.96723)
## 5      5  (7.61525 51.96736)
## 6      6 (7.615148 51.96745)
## # … with 3,323 more rows
## #
## # Edge Data: 4,667 x 5
##    from    to highway                                       geometry edgeID
##                                        
## 1     1     2 service           (7.63275 51.9603, 7.631843 51.96061)      1
## 2     3     4 secondary       (7.614156 51.96724, 7.613797 51.96723)      2
## 3     5     6 cycleway  (7.61525 51.96736, 7.615165 51.96744, 7.615…      3
## # … with 4,664 more rows

Combining the best of both worlds

Having the network stored in the tbl_graph structure, with a geometry
list column for both the edges and nodes, enables us to combine the wide
range of functionalities in sf and tidygraph, in a way that fits neatly
into the tidyverse.

With the activate() verb, we specify if we want to manipulate the
edges or the nodes. Then, most dplyr verbs can be used in the familiar
way, also when directly applied to the geometry list column. For
example, we can add a variable describing the length of each edge,
which, later, we use as a weight for the edges.

graph <- graph %>%
  activate(edges) %>%
  mutate(length = st_length(geometry))

graph

## # A tbl_graph: 3329 nodes and 4667 edges
## #
## # An undirected multigraph with 34 components
## #
## # Edge Data: 4,667 x 6 (active)
##    from    to highway                              geometry edgeID   length
##                                     [m]
## 1     1     2 service  (7.63275 51.9603, 7.631843 51.96061)      1 71.2778…
## 2     3     4 seconda… (7.614156 51.96724, 7.613797 51.967…      2 24.7146…
## 3     5     6 cycleway (7.61525 51.96736, 7.615165 51.9674…      3 12.2303…
## 4     7     8 footway  (7.629304 51.96712, 7.629304 51.967…      4 20.0122…
## 5     9    10 steps    (7.627696 51.96534, 7.62765 51.9653…      5  3.2926…
## 6    11    12 service  (7.633612 51.96548, 7.633578 51.965…      6  7.4291…
## # … with 4,661 more rows
## #
## # Node Data: 3,329 x 2
##   nodeID            geometry
##             
## 1      1   (7.63275 51.9603)
## 2      2 (7.631843 51.96061)
## 3      3 (7.614156 51.96724)
## # … with 3,326 more rows

With one flow of pipes, we can ‘escape’ the graph structure, turn either
the edges or nodes back into real sf objects, and, for example,
summarise the data based on a specific variable.

graph %>%
  activate(edges) %>%
  as_tibble() %>%
  st_as_sf() %>%
  group_by(highway) %>%
  summarise(length = sum(length))

## Simple feature collection with 17 features and 2 fields
## geometry type:  MULTILINESTRING
## dimension:      XY
## bbox:           xmin: 7.603762 ymin: 51.94823 xmax: 7.641418 ymax: 51.97241
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
## # A tibble: 17 x 3
##    highway         length                                          geometry
##  *               [m]                             
##  1 corridor        9.377… ((7.620506 51.96262, 7.620542 51.96266), (7.6205…
##  2 cycleway    19275.917… ((7.619683 51.95395, 7.619641 51.95378, 7.619559…
##  3 footway     42822.070… ((7.640529 51.95325, 7.640528 51.95323, 7.640531…
##  4 path         8193.341… ((7.624007 51.95379, 7.624223 51.95378, 7.624253…
##  5 pedestrian  11399.337… ((7.620362 51.95471, 7.620477 51.9547), (7.62012…
##  6 primary      3574.842… ((7.625556 51.95272, 7.625594 51.95284, 7.625714…
##  7 primary_li…   184.385… ((7.617285 51.96609, 7.617286 51.96624, 7.617295…
##  8 residential 22729.748… ((7.614509 51.95351, 7.614554 51.95346), (7.6230…
##  9 secondary    4460.683… ((7.629784 51.95446, 7.629842 51.95444), (7.6312…
## 10 secondary_…   160.708… ((7.635309 51.95946, 7.635705 51.95948), (7.6349…
## 11 service     27027.822… ((7.624803 51.95393, 7.625072 51.95393), (7.6158…
## 12 steps        1321.841… ((7.634423 51.9546, 7.634438 51.95462), (7.61430…
## 13 tertiary     4356.365… ((7.607112 51.94991, 7.607126 51.94992, 7.607183…
## 14 tertiary_l…    43.856… ((7.623592 51.96612, 7.623568 51.96611, 7.623468…
## 15 track         389.866… ((7.610671 51.95778, 7.610571 51.95759, 7.610585…
## 16 unclassifi…   612.816… ((7.634492 51.95613, 7.634689 51.95611), (7.6343…
## 17          3161.883… ((7.634374 51.95579, 7.634545 51.95575, 7.634662…

Switching back to sf objects is useful as well when plotting the
network, in a way that preserves its spatial properties.

ggplot() +
  geom_sf(data = graph %>% activate(edges) %>% as_tibble() %>% st_as_sf()) + 
  geom_sf(data = graph %>% activate(nodes) %>% as_tibble() %>% st_as_sf(), size = 0.5)

Or, alternatively, in only a few lines of code, plot the network as an
interactive map. On this page, the interactive map might show as an
image, but here, you
should be able to really interact with it!

tmap_mode('view')

tm_shape(graph %>% activate(edges) %>% as_tibble() %>% st_as_sf()) +
  tm_lines() +
tm_shape(graph %>% activate(nodes) %>% as_tibble() %>% st_as_sf()) +
  tm_dots() +
tmap_options(basemaps = 'OpenStreetMap')

All nice and well, but these are not things that we necessarily need the
graph representation for. The added value of tidygraph, is that it opens
the door to the functions of the igraph library, all specifically
designed for network analysis, and enables us to use them inside a
‘tidy’ workflow. To cover them all, we would need to write a book,
but let’s at least show a few examples below.

Centrality measures

Centraltity measures describe the importances of nodes in the network.
The simplest of those measures is the degree centrality: the number of
edges connected to a node. Another example is the betweenness
centrality, which, simply stated, is the number of shortest paths that
pass through a node. In tidygraph, we can calculate these and many other
centrality measures, and simply add them as a variable to the nodes.

The betweenness centrality can also be calculated for edges. In that
case, it specifies the number of shortest paths that pass through an
edge.

graph <- graph %>%
  activate(nodes) %>%
  mutate(degree = centrality_degree()) %>%
  mutate(betweenness = centrality_betweenness(weights = length)) %>%
  activate(edges) %>%
  mutate(betweenness = centrality_edge_betweenness(weights = length))

graph

## # A tbl_graph: 3329 nodes and 4667 edges
## #
## # An undirected multigraph with 34 components
## #
## # Edge Data: 4,667 x 7 (active)
##    from    to highway                  geometry edgeID   length betweenness
##                         [m]       
## 1     1     2 service (7.63275 51.9603, 7.6318…      1 71.2778…       90271
## 2     3     4 second… (7.614156 51.96724, 7.61…      2 24.7146…       32946
## 3     5     6 cyclew… (7.61525 51.96736, 7.615…      3 12.2303…       12409
## 4     7     8 footway (7.629304 51.96712, 7.62…      4 20.0122…       22043
## 5     9    10 steps   (7.627696 51.96534, 7.62…      5  3.2926…       83380
## 6    11    12 service (7.633612 51.96548, 7.63…      6  7.4291…       17119
## # … with 4,661 more rows
## #
## # Node Data: 3,329 x 4
##   nodeID            geometry degree betweenness
##                      
## 1      1   (7.63275 51.9603)      3      117041
## 2      2 (7.631843 51.96061)      3       94899
## 3      3 (7.614156 51.96724)      3       32881
## # … with 3,326 more rows

ggplot() +
  geom_sf(data = graph %>% activate(edges) %>% as_tibble() %>% st_as_sf(), col = 'grey50') + 
  geom_sf(data = graph %>% activate(nodes) %>% as_tibble() %>% st_as_sf(), aes(col = betweenness, size = betweenness)) +
  scale_colour_viridis_c(option = 'inferno') +
  scale_size_continuous(range = c(0,4))

ggplot() +
  geom_sf(data = graph %>% activate(edges) %>% as_tibble() %>% st_as_sf(), aes(col = betweenness, size = betweenness)) +
  scale_colour_viridis_c(option = 'inferno') +
  scale_size_continuous(range = c(0,4))

Shortest paths

A core part of spatial network analysis is generally finding the path
between two nodes that minimizes either the travel distance or travel
time. In igraph, there are several functions that can be used for this
purpose, and since a tbl_graph is just a subclass of an igraph
object, we can directly input it into every function in the igraph
package.

The function distances, for example, returns a numeric matrix
containing the distances of the shortest paths between every possible
combination of nodes. It will automatically choose a suitable algorithm
to calculate these shortest paths.

distances <- distances(
  graph = graph,
  weights = graph %>% activate(edges) %>% pull(length)
)

distances[1:5, 1:5]

##            [,1]       [,2]       [,3]      [,4]       [,5]
## [1,]    0.00000   71.27789 1670.22046 1694.9351 1596.80669
## [2,]   71.27789    0.00000 1619.80567 1644.5203 1546.39190
## [3,] 1670.22046 1619.80567    0.00000   24.7146   77.64829
## [4,] 1694.93506 1644.52027   24.71460    0.0000  102.36289
## [5,] 1596.80669 1546.39190   77.64829  102.3629    0.00000

The function ‘shortest_paths’ not only returns distances, but also the
indices of the nodes and edges that make up the path. When we relate
them to their corresponding geometry columns, we get the spatial
representation of the shortest paths. Instead of doing this for all
possible combinations of nodes, we can specify from and to which nodes
we want to calculate the shortest paths. Here, we will show an example
of a shortest path from one node to another, but it is just as well
possible to do the same for one to many, many to one, or many to many
nodes. Whenever the graph is weighted, the Dijkstra algoritm will be
used under the hood. Note here that we have to define the desired output
beforehand: vpath means that only the nodes (called vertices in
igraph) are returned, epath means that only the edges are returned,
and both returns them both.

from_node <- graph %>%
  activate(nodes) %>%
  filter(nodeID == 3044) %>%
  pull(nodeID)

to_node <- graph %>%
  activate(nodes) %>%
  filter(nodeID == 3282) %>%
  pull(nodeID)

path <- shortest_paths(
  graph = graph,
  from = from_node,
  to = to_node,
  output = 'both',
  weights = graph %>% activate(edges) %>% pull(length)
)

path$vpath

## [[1]]
## + 50/3329 vertices, from 2ee524d:
##  [1] 3044 3043 1581 1076 1059 1058 1270 1489  609  608 1549 1550 2998 2448
## [15] 2057 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1468
## [29] 1469 1470 1471 1704 1916 1476   23   24 3064  740  743 1053 1490 1447
## [43] 1446 1445 3052 1444 1443 3051 1442 3282

path$epath

## [[1]]
## + 49/4667 edges from 2ee524d:
##  [1] 3043--3044 1581--3043 1076--1581 1059--1076 1058--1059 1058--1270
##  [7] 1270--1489  609--1489  608-- 609  608--1549 1549--1550 1550--2998
## [13] 2448--2998 2057--2448 1528--2057 1528--1529 1529--1530 1530--1531
## [19] 1531--1532 1532--1533 1533--1534 1534--1535 1535--1536 1536--1537
## [25] 1537--1538 1538--1539 1468--1539 1468--1469 1469--1470 1470--1471
## [31] 1471--1704 1704--1916 1476--1916   23--1476   23--  24   24--3064
## [37]  740--3064  740-- 743  743--1053 1053--1490 1447--1490 1446--1447
## [43] 1445--1446 1445--3052 1444--3052 1443--1444 1443--3051 1442--3051
## [49] 1442--3282

A subgraph of the original graph can now be created, containing only
those nodes and edges that are part of the calculated path. Note here
that something inconvenient happens: igraph seems to have it’s own
underlying structure for attaching indices to nodes, which makes that
the from and to columns in the egdes data don’t match with our
nodeID column in the nodes data anymore, but instead simply refer to
the row numbers of the nodes data.

path_graph <- graph %>%
    subgraph.edges(eids = path$epath %>% unlist()) %>%
    as_tbl_graph()

path_graph

## # A tbl_graph: 50 nodes and 49 edges
## #
## # An undirected simple graph with 1 component
## #
## # Node Data: 50 x 4 (active)
##   nodeID            geometry degree betweenness
##                      
## 1     23  (7.62837 51.96292)      4      425585
## 2     24 (7.628317 51.96308)      3      499622
## 3    608 (7.626629 51.95707)      3      196573
## 4    609 (7.626621 51.95699)      3      390732
## 5    740 (7.629458 51.96364)      3      158948
## 6    743 (7.629787 51.96392)      3      156575
## # … with 44 more rows
## #
## # Edge Data: 49 x 7
##    from    to highway                  geometry edgeID   length betweenness
##                         [m]       
## 1     1     2 reside… (7.62837 51.96292, 7.628…     12 18.6254…      426879
## 2     3     4 reside… (7.626629 51.95707, 7.62…    355  8.2067…      198070
## 3     8     9 reside… (7.626552 51.95648, 7.62…    643  6.5847…      149777
## # … with 46 more rows

This subgraph can now be analyzed, for example by calculating the total
length of the path, …

path_graph %>%
  activate(edges) %>%
  as_tibble() %>%
  summarise(length = sum(length))

## # A tibble: 1 x 1
##     length
##        [m]
## 1 1362.259

… and plotted.

ggplot() +
  geom_sf(data = graph %>% activate(edges) %>% as_tibble() %>% st_as_sf(), col = 'darkgrey') +
  geom_sf(data = graph %>% activate(nodes) %>% as_tibble() %>% st_as_sf(), col = 'darkgrey', size = 0.5) +
  geom_sf(data = path_graph %>% activate(edges) %>% as_tibble() %>% st_as_sf(), lwd = 1, col = 'firebrick') +
  geom_sf(data = path_graph %>% activate(nodes) %>% filter(nodeID %in% c(from_node, to_node)) %>% as_tibble() %>% st_as_sf(), size = 2)

However, often we will be interested in shortest paths between
geographical points that are not necessarily nodes in the network. For
example, we might want to calculate the shortest path from the railway
station of Münster to the cathedral.

muenster_station <- st_point(c(7.6349, 51.9566)) %>% 
  st_sfc(crs = 4326)

muenster_cathedral <- st_point(c(7.626, 51.962)) %>%
  st_sfc(crs = 4326)

ggplot() +
  geom_sf(data = graph %>% activate(edges) %>% as_tibble() %>% st_as_sf(), col = 'darkgrey') +
  geom_sf(data = graph %>% activate(nodes) %>% as_tibble() %>% st_as_sf(), col = 'darkgrey', size = 0.5) +
  geom_sf(data = muenster_station, size = 2, col = 'firebrick') +
  geom_sf(data = muenster_cathedral, size = 2, col = 'firebrick') +
  geom_sf_label(data = muenster_station, aes(label = 'station'), nudge_x = 0.004) +
  geom_sf_label(data = muenster_cathedral, aes(label = 'cathedral'), nudge_x = 0.005)

To find the route on the network, we must first identify the nearest
points on the network. The nabor package has a well performing
function to do so. It does, however, require the coordinates of the
origin and destination nodes to be given in a matrix.

# Coordinates of the origin and destination node, as matrix
coords_o <- muenster_station %>%
  st_coordinates() %>%
  matrix(ncol = 2)

coords_d <- muenster_cathedral %>%
  st_coordinates() %>%
  matrix(ncol = 2)

# Coordinates of all nodes in the network
nodes <- graph %>%
  activate(nodes) %>%
  as_tibble() %>%
  st_as_sf()

coords <- nodes %>%
  st_coordinates()

# Calculate nearest points on the network.
node_index_o <- knn(data = coords, query = coords_o, k = 1)
node_index_d <- knn(data = coords, query = coords_d, k = 1)
node_o <- nodes[node_index_o$nn.idx, ]
node_d <- nodes[node_index_d$nn.idx, ]

Like before, we use the ID to calculate the shortest path, and plot it:

path <- shortest_paths(
  graph = graph,
  from = node_o$nodeID, # new origin
  to = node_d$nodeID,   # new destination
  output = 'both',
  weights = graph %>% activate(edges) %>% pull(length)
)

path_graph <- graph %>%
    subgraph.edges(eids = path$epath %>% unlist()) %>%
    as_tbl_graph()

ggplot() +
  geom_sf(data = graph %>% activate(edges) %>% as_tibble() %>% st_as_sf(), col = 'darkgrey') +
  geom_sf(data = graph %>% activate(nodes) %>% as_tibble() %>% st_as_sf(), col = 'darkgrey', size = 0.5) +
  geom_sf(data = path_graph %>% activate(edges) %>% as_tibble() %>% st_as_sf(), lwd = 1, col = 'firebrick') +
  geom_sf(data = muenster_station, size = 2) +
  geom_sf(data = muenster_cathedral, size = 2)  +
  geom_sf_label(data = muenster_station, aes(label = 'station'), nudge_x = 0.004) +
  geom_sf_label(data = muenster_cathedral, aes(label = 'cathedral'), nudge_x = 0.005)

It worked! We calculated a path from the rail station to the centre, a
common trip taken by tourists visiting Muenster. Clearly this is not a
finished piece of work but the post has demonstrated what is possible.
Future functionality should look to make spatial networks more user
friendly, including provision of ‘weighting profiles’, batch routing and
functions that reduce the number of steps needed to work with spatial
network data in R.

For alternative approaches and further reading, the following resources
are recommended:

To leave a comment for the author, please follow the link and comment on their blog: r-spatial.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)