Spatial networks in R with sf and tidygraph

[This article was first published on r-spatial, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Spatial networks in R with sf and tidygraph

Lucas van der Meer, Robin Lovelace & Lorena Abad
September 26, 2019

Introduction

Street networks, shipping routes, telecommunication lines, river
bassins. All examples of spatial networks: organized systems of nodes
and edges embedded in space. For most of them, these nodes and edges can
be associated with geographical coordinates. That is, the nodes are
geographical points, and the edges geographical lines.

Such spatial networks can be analyzed using graph theory. Not for
nothing, Leonhard Eulers famous work on the Seven Bridges of
Köningsberg
,
which laid the foundations of graph theory and network analysis, was in
essence a spatial problem.

In R, there are advanced, modern tools for both the analysis of spatial
data and networks. Furthermore, several packages have been developed
that cover (parts of) spatial network analysis. As stated in this
github issue and this
tweet,
concensus on how best to represent and analyse spatial networks has
proved elusive.

This blogpost demonstrates an approach to spatial networks that starts
with a set of geographic lines, and leads to an object ready to be used
for network analysis. Along the way, we will see that there are still
steps that need to be taken before the process of analyzing spatial
networks in R is user friendly and efficient.

Existing R packages for spatial networks

Although R was originally designed as a language for statistical
computing, an active ‘R-spatial’ ecosystem has evolved. Powerful and
high performance packages for spatial data analysis have been developed,
thanks largely to interfaces to mature C/C++ libraries such as GDAL,
GEOS and PROJ, notably in the package
sf (see
section 1.5
of Geocomputation with R for a brief history). Likewise, a number of
packages for graph representation and analysis have been developed,
notably tidygraph, which is
based on igraph.

Both sf and tidygraph support the tibble class and the broader ‘tidy’
approach to data science, which involves data processing pipelines, type
stability and a convention of representing everything as a data frame
(well a tibble, which is a data frame with user friendly default
settings). In sf, this means storing spatial vector data as objects of
class sf, which are essentially the same as a regular data frame (or
tibble), but with an additional ‘sticky’ list column containing a
geometry for each feature (row), and attributes such as bounding box and
CRS. Tidygraph stores networks in objects of class tbl_graph. A
tbl_graph is an igraph object, but enables the user to manipulate
both the edges and nodes elements as if they were data frames also.

Both sf and tidygraph are relatively new packages (first released on
CRAN in 2016 and 2017, respectively). It is unsurprising, therefore,
that they have yet to be combined to allow a hybrid, tibble-based
representation of spatial networks.

Nevertheless, a number of approaches have been developed for
representing spatial networks, and some of these are in packages that
have been published on CRAN.
stplanr, for instance, contains
the SpatialLinesNetwork class, which works with both the
sp (a package for spatial data analysis
launched in 2005) and sf packages.
dodgr is a more recent package
that provides analytical tools for street networks, with a focus on
directed graphs (that can have direction-dependent weights,
e.g. representing a one-way street). Other packages seeking to
implement spatial networks in R include
spnetwork, a package that defined
a class system combining sp and igraph, and
shp2graph,
which provides tools to switch between sp and igraph objects.

Each package has its merits that deserve to be explored in more detail
(possibly in a subsequent blog post). The remainder of this post
outlines an approach that combines sf and igraph objects in a
tidygraph object.

Set-up

The following code chunk will install the packages used in this post:

<span class="c1"># We'll use remotes to install packages, install it if needs be:</span><span class="w">
</span><span class="k">if</span><span class="p">(</span><span class="o">!</span><span class="s2">"remotes"</span><span class="w"> </span><span class="o">%in%</span><span class="w"> </span><span class="n">installed.packages</span><span class="p">())</span><span class="w"> </span><span class="p">{</span><span class="w">
  </span><span class="n">install.packages</span><span class="p">(</span><span class="s2">"remotes"</span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w">

</span><span class="n">cran_pkgs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="w">
  </span><span class="s2">"sf"</span><span class="p">,</span><span class="w">
  </span><span class="s2">"tidygraph"</span><span class="p">,</span><span class="w">
  </span><span class="s2">"igraph"</span><span class="p">,</span><span class="w">
  </span><span class="s2">"osmdata"</span><span class="p">,</span><span class="w">
  </span><span class="s2">"dplyr"</span><span class="p">,</span><span class="w">
  </span><span class="s2">"tibble"</span><span class="p">,</span><span class="w">
  </span><span class="s2">"ggplot2"</span><span class="p">,</span><span class="w">
  </span><span class="s2">"units"</span><span class="p">,</span><span class="w">
  </span><span class="s2">"tmap"</span><span class="p">,</span><span class="w">
  </span><span class="s2">"rgrass7"</span><span class="p">,</span><span class="w">
  </span><span class="s2">"link2GI"</span><span class="p">,</span><span class="w">
  </span><span class="s2">"nabor"</span><span class="w">
</span><span class="p">)</span><span class="w">

</span><span class="n">remotes</span><span class="o">::</span><span class="n">install_cran</span><span class="p">(</span><span class="n">cran_pkgs</span><span class="p">)</span><span class="w">
</span>
<span class="n">library</span><span class="p">(</span><span class="n">sf</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">tidygraph</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">igraph</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">dplyr</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">tibble</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">ggplot2</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">units</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">tmap</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">osmdata</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">rgrass7</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">link2GI</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">nabor</span><span class="p">)</span><span class="w">
</span>

Getting the data

As an example, we use the street network of the city center of Münster,
Germany. We will get the data from OpenStreetMap. Packages like dodgr
have optimized their code for such data, however considering that we
want to showcase this workflow for any source of data, we will generate
an object of class sf containing only LINESTRING geometries.
However, streets that form loops, are returned by osmdata as polygons,
rather than lines. These, we will convert to lines, using the
osm_poly2line function. One additional variable, the type of street,
is added to show that the same steps can be used for sf objects that
contain any number of additional variables.

<span class="n">muenster</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">opq</span><span class="p">(</span><span class="n">bbox</span><span class="w"> </span><span class="o">=</span><span class="w">  </span><span class="nf">c</span><span class="p">(</span><span class="m">7.61</span><span class="p">,</span><span class="w"> </span><span class="m">51.954</span><span class="p">,</span><span class="w"> </span><span class="m">7.636</span><span class="p">,</span><span class="w"> </span><span class="m">51.968</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
  </span><span class="n">add_osm_feature</span><span class="p">(</span><span class="n">key</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'highway'</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
  </span><span class="n">osmdata_sf</span><span class="p">()</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
  </span><span class="n">osm_poly2line</span><span class="p">()</span><span class="w">

</span><span class="n">muenster_center</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">muenster</span><span class="o">$</span><span class="n">osm_lines</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
  </span><span class="n">select</span><span class="p">(</span><span class="n">highway</span><span class="p">)</span><span class="w">
</span>
<span class="n">muenster_center</span><span class="w">
</span>
## Simple feature collection with 2233 features and 1 field
## geometry type:  LINESTRING
## dimension:      XY
## bbox:           xmin: 7.603762 ymin: 51.94823 xmax: 7.641418 ymax: 51.97241
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
## First 10 features:
##             highway                       geometry
## 4064462     primary LINESTRING (7.624554 51.955...
## 4064463     primary LINESTRING (7.626498 51.956...
## 4064467 residential LINESTRING (7.630898 51.955...
## 4064474     primary LINESTRING (7.61972 51.9554...
## 4064476     primary LINESTRING (7.619844 51.954...
## 4064482    tertiary LINESTRING (7.616395 51.957...
## 4064485     service LINESTRING (7.63275 51.9603...
## 4984982   secondary LINESTRING (7.614156 51.967...
## 4985138    cycleway LINESTRING (7.61525 51.9673...
## 4985140 residential LINESTRING (7.616774 51.968...
<span class="n">ggplot</span><span class="p">(</span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">muenster_center</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">geom_sf</span><span class="p">()</span><span class="w">
</span>

From sf to tbl_graph: a step wise approach

Step 1: Clean the network

To perform network analysis, we need a network with a clean topology. In
theory, the best way to clean up the network topology is by manual
editing, but this can be very labour intensive and time consuming,
mainly for large networks. The
v.clean toolset
from the GRASS GIS software provides automated functionalities for this
task, and is therefore a popular instrument within the field of spatial
network analysis. As far as we know, there is no R equivalent for this
toolset, but fortunately, the
rgrass7
and link2GI packages enable us
to easily ‘bridge’ to GRASS GIS. Obviously, this requires to have GRASS
GIS installed on your computer. For an in depth description of combining
R with open source GIS software, see
Chapter 9 of
Geocomputation with R. Take into account that the linking process may
take up some time, especially on Windows operating systems. Also, note
that there have been some large changes recently to the rgrass7
package, which enabled a better integration with sf. However, it also
means that the code below will not work when using an older version of
rgrass7, so make sure to update if needed.

Here, we will clean the network topology by breaking lines at
intersections and also breaking lines that form a collapsed loop. This
will be followed by a removal of duplicated geometry features. Once
done, we will read the data back into R, and convert again into an sf
object with LINESTRING geometry. For this we use rgrass7, which
requires some set-up code that can be seen in the source
code

of this post.

<span class="c1"># Add data to GRASS spatial database  </span><span class="w">
</span><span class="n">writeVECT</span><span class="p">(</span><span class="w">
  </span><span class="n">SDF</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">muenster_center</span><span class="p">,</span><span class="w"> 
  </span><span class="n">vname</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'muenster_center'</span><span class="p">,</span><span class="w"> 
  </span><span class="n">v.in.ogr_flags</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'overwrite'</span><span class="w">
</span><span class="p">)</span><span class="w">

</span><span class="c1"># Execute the v.clean tool</span><span class="w">
</span><span class="n">execGRASS</span><span class="p">(</span><span class="s2">"g.proj"</span><span class="p">,</span><span class="w"> </span><span class="n">flags</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"c"</span><span class="p">,</span><span class="w"> </span><span class="s2">"quiet"</span><span class="p">),</span><span class="w"> </span><span class="n">proj4</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">proj4</span><span class="p">)</span><span class="w">
</span><span class="n">execGRASS</span><span class="p">(</span><span class="w">
  </span><span class="n">cmd</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'v.clean'</span><span class="p">,</span><span class="w"> 
  </span><span class="n">input</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'muenster_center'</span><span class="p">,</span><span class="w"> 
  </span><span class="n">output</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'muenster_cleaned'</span><span class="p">,</span><span class="w">        
  </span><span class="n">tool</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'break'</span><span class="p">,</span><span class="w"> 
  </span><span class="n">flags</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s1">'overwrite'</span><span class="p">,</span><span class="w"> </span><span class="s1">'c'</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">

</span><span class="c1"># Read back into R</span><span class="w">
</span><span class="n">use_sf</span><span class="p">()</span><span class="w">
</span><span class="n">muenster_center</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">readVECT</span><span class="p">(</span><span class="s1">'muenster_cleaned'</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">rename</span><span class="p">(</span><span class="n">geometry</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">geom</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">select</span><span class="p">(</span><span class="o">-</span><span class="n">cat</span><span class="p">)</span><span class="w">
</span>
<span class="n">muenster_center</span><span class="w">
</span>
## Simple feature collection with 4667 features and 1 field
## geometry type:  LINESTRING
## dimension:      XY
## bbox:           xmin: 7.603762 ymin: 51.94823 xmax: 7.641418 ymax: 51.97241
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
## First 10 features:
##        highway                       geometry
## 1      service LINESTRING (7.63275 51.9603...
## 2    secondary LINESTRING (7.614156 51.967...
## 3     cycleway LINESTRING (7.61525 51.9673...
## 4      footway LINESTRING (7.629304 51.967...
## 5        steps LINESTRING (7.627696 51.965...
## 6      service LINESTRING (7.633612 51.965...
## 7  residential LINESTRING (7.630564 51.957...
## 8      service LINESTRING (7.613545 51.960...
## 9     cycleway LINESTRING (7.619781 51.957...
## 10 residential LINESTRING (7.62373 51.9643...

Step 2: Give each edge a unique index

The edges of the network, are simply the linestrings in the data. Each
of them gets a unique index, which can later be related to their start
and end node.

<span class="n">edges</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">muenster_center</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">mutate</span><span class="p">(</span><span class="n">edgeID</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="n">n</span><span class="p">()))</span><span class="w">

</span><span class="n">edges</span><span class="w">
</span>
## Simple feature collection with 4667 features and 2 fields
## geometry type:  LINESTRING
## dimension:      XY
## bbox:           xmin: 7.603762 ymin: 51.94823 xmax: 7.641418 ymax: 51.97241
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
## First 10 features:
##        highway                       geometry edgeID
## 1      service LINESTRING (7.63275 51.9603...      1
## 2    secondary LINESTRING (7.614156 51.967...      2
## 3     cycleway LINESTRING (7.61525 51.9673...      3
## 4      footway LINESTRING (7.629304 51.967...      4
## 5        steps LINESTRING (7.627696 51.965...      5
## 6      service LINESTRING (7.633612 51.965...      6
## 7  residential LINESTRING (7.630564 51.957...      7
## 8      service LINESTRING (7.613545 51.960...      8
## 9     cycleway LINESTRING (7.619781 51.957...      9
## 10 residential LINESTRING (7.62373 51.9643...     10

Step 3: Create nodes at the start and end point of each edge

The nodes of the network, are the start and end points of the edges. The
locations of these points can be derived by using the st_coordinates
function in sf. When given a set of linestrings, this function breaks
down each of them into the points they are built up from. It returns a
matrix with the X and Y coordinates of those points, and additionally an
integer indicator L1 specifying to which line a point belongs. These
integer indicators correspond to the edge indices defined in step 1.
That is, if we convert the matrix into a data.frame or tibble, group
the features by the edge index, and only keep the first and last feature
of each group, we have the start and end points of the linestrings.

<span class="n">nodes</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">edges</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">st_coordinates</span><span class="p">()</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">as_tibble</span><span class="p">()</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">rename</span><span class="p">(</span><span class="n">edgeID</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">L1</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">group_by</span><span class="p">(</span><span class="n">edgeID</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">slice</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="n">n</span><span class="p">()))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">ungroup</span><span class="p">()</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">mutate</span><span class="p">(</span><span class="n">start_end</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">rep</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="s1">'start'</span><span class="p">,</span><span class="w"> </span><span class="s1">'end'</span><span class="p">),</span><span class="w"> </span><span class="n">times</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">n</span><span class="p">()</span><span class="o">/</span><span class="m">2</span><span class="p">))</span><span class="w">

</span><span class="n">nodes</span><span class="w">
</span>
## # A tibble: 9,334 x 4
##        X     Y edgeID start_end
##    <dbl> <dbl>  <dbl> <chr>    
##  1  7.63  52.0      1 start    
##  2  7.63  52.0      1 end      
##  3  7.61  52.0      2 start    
##  4  7.61  52.0      2 end      
##  5  7.62  52.0      3 start    
##  6  7.62  52.0      3 end      
##  7  7.63  52.0      4 start    
##  8  7.63  52.0      4 end      
##  9  7.63  52.0      5 start    
## 10  7.63  52.0      5 end      
## # … with 9,324 more rows

Step 4: Give each node a unique index

Each of the nodes in the network needs to get a unique index, such that
they can be related to the edges. However, we need to take into account
that edges can share either startpoints and/or endpoints. Such
duplicated points, that have the same X and Y coordinate, are one single
node, and should therefore get the same index. Note that the coordinate
values as displayed in the tibble are rounded, and may look the same for
several rows, even when they are not. We can use the group_indices
function in dplyr to give each group of unique X,Y-combinations a unique
index.

<span class="n">nodes</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">nodes</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">mutate</span><span class="p">(</span><span class="n">xy</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">paste</span><span class="p">(</span><span class="n">.</span><span class="o">$</span><span class="n">X</span><span class="p">,</span><span class="w"> </span><span class="n">.</span><span class="o">$</span><span class="n">Y</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
  </span><span class="n">mutate</span><span class="p">(</span><span class="n">nodeID</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">group_indices</span><span class="p">(</span><span class="n">.</span><span class="p">,</span><span class="w"> </span><span class="n">factor</span><span class="p">(</span><span class="n">xy</span><span class="p">,</span><span class="w"> </span><span class="n">levels</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">unique</span><span class="p">(</span><span class="n">xy</span><span class="p">))))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">select</span><span class="p">(</span><span class="o">-</span><span class="n">xy</span><span class="p">)</span><span class="w">

</span><span class="n">nodes</span><span class="w">
</span>
## # A tibble: 9,334 x 5
##        X     Y edgeID start_end nodeID
##    <dbl> <dbl>  <dbl> <chr>      <int>
##  1  7.63  52.0      1 start          1
##  2  7.63  52.0      1 end            2
##  3  7.61  52.0      2 start          3
##  4  7.61  52.0      2 end            4
##  5  7.62  52.0      3 start          5
##  6  7.62  52.0      3 end            6
##  7  7.63  52.0      4 start          7
##  8  7.63  52.0      4 end            8
##  9  7.63  52.0      5 start          9
## 10  7.63  52.0      5 end           10
## # … with 9,324 more rows

Step 5: Combine the node indices with the edges

Now each of the start and endpoints have been assigned a node ID in step
4, so that we can add the node indices to the edges. In other words, we
can specify for each edge, in which node it starts, and in which node it
ends.

<span class="n">source_nodes</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">nodes</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">filter</span><span class="p">(</span><span class="n">start_end</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s1">'start'</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">pull</span><span class="p">(</span><span class="n">nodeID</span><span class="p">)</span><span class="w">

</span><span class="n">target_nodes</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">nodes</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">filter</span><span class="p">(</span><span class="n">start_end</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s1">'end'</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">pull</span><span class="p">(</span><span class="n">nodeID</span><span class="p">)</span><span class="w">

</span><span class="n">edges</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">edges</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">mutate</span><span class="p">(</span><span class="n">from</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">source_nodes</span><span class="p">,</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">target_nodes</span><span class="p">)</span><span class="w">

</span><span class="n">edges</span><span class="w">
</span>
## Simple feature collection with 4667 features and 4 fields
## geometry type:  LINESTRING
## dimension:      XY
## bbox:           xmin: 7.603762 ymin: 51.94823 xmax: 7.641418 ymax: 51.97241
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
## First 10 features:
##        highway                       geometry edgeID from to
## 1      service LINESTRING (7.63275 51.9603...      1    1  2
## 2    secondary LINESTRING (7.614156 51.967...      2    3  4
## 3     cycleway LINESTRING (7.61525 51.9673...      3    5  6
## 4      footway LINESTRING (7.629304 51.967...      4    7  8
## 5        steps LINESTRING (7.627696 51.965...      5    9 10
## 6      service LINESTRING (7.633612 51.965...      6   11 12
## 7  residential LINESTRING (7.630564 51.957...      7   13 14
## 8      service LINESTRING (7.613545 51.960...      8   15 16
## 9     cycleway LINESTRING (7.619781 51.957...      9   17 18
## 10 residential LINESTRING (7.62373 51.9643...     10   19 20

Step 6: Remove duplicate nodes

Having added the unique node ID’s to the edges data, we don’t need the
duplicated start and endpoints anymore. After removing them, we end up
with a tibble in which each row represents a unique, single node. This
tibble can be converted into an sf object, with POINT geometries.

<span class="n">nodes</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">nodes</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">distinct</span><span class="p">(</span><span class="n">nodeID</span><span class="p">,</span><span class="w"> </span><span class="n">.keep_all</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">select</span><span class="p">(</span><span class="o">-</span><span class="nf">c</span><span class="p">(</span><span class="n">edgeID</span><span class="p">,</span><span class="w"> </span><span class="n">start_end</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">st_as_sf</span><span class="p">(</span><span class="n">coords</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s1">'X'</span><span class="p">,</span><span class="w"> </span><span class="s1">'Y'</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">st_set_crs</span><span class="p">(</span><span class="n">st_crs</span><span class="p">(</span><span class="n">edges</span><span class="p">))</span><span class="w">

</span><span class="n">nodes</span><span class="w">
</span>
## Simple feature collection with 3329 features and 1 field
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 7.603762 ymin: 51.94823 xmax: 7.641418 ymax: 51.97241
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
## # A tibble: 3,329 x 2
##    nodeID            geometry
##     <int>         <POINT [°]>
##  1      1   (7.63275 51.9603)
##  2      2 (7.631843 51.96061)
##  3      3 (7.614156 51.96724)
##  4      4 (7.613797 51.96723)
##  5      5  (7.61525 51.96736)
##  6      6 (7.615148 51.96745)
##  7      7 (7.629304 51.96712)
##  8      8  (7.629308 51.9673)
##  9      9 (7.627696 51.96534)
## 10     10  (7.62765 51.96534)
## # … with 3,319 more rows

Step 7: Convert to tbl_graph

The first six steps led to one sf object with LINESTRING geometries,
representing the edges of the network, and one sf object with POINT
geometries, representing the nodes of the network. The tbl_graph
function allows us to convert these two into a tbl_graph object. There
are two tricky parts in this step that need to be highlighted. One, is
that the columns containing the indices of the source and target nodes
should either be the first two columns of the sf object, or be named
‘to’ and ‘from’, respectively. Secondly, inside the tbl_graph
function, these columns are converted into a two-column matrix. However,
an sf object has a so-called ‘sticky geometry’, which means that the
geometry column sticks to the attributes whenever specific columns are
selected. Therefore, the matrix created inside tbl_graph has three
columns instead of two, and that causes an error. Therefore, we first
need to convert the sf object to a regular data.frame or tibble,
before we can construct a tbl_graph. In the end, this doesn’t matter,
since both the nodes and edges will be ‘integrated’ into an igraph
structure, and loose their specific sf
characteristics.

<span class="n">graph</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tbl_graph</span><span class="p">(</span><span class="n">nodes</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">nodes</span><span class="p">,</span><span class="w"> </span><span class="n">edges</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">as_tibble</span><span class="p">(</span><span class="n">edges</span><span class="p">),</span><span class="w"> </span><span class="n">directed</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">)</span><span class="w">

</span><span class="n">graph</span><span class="w">
</span>
## # A tbl_graph: 3329 nodes and 4667 edges
## #
## # An undirected multigraph with 34 components
## #
## # Node Data: 3,329 x 2 (active)
##   nodeID            geometry
##    <int>         <POINT [°]>
## 1      1   (7.63275 51.9603)
## 2      2 (7.631843 51.96061)
## 3      3 (7.614156 51.96724)
## 4      4 (7.613797 51.96723)
## 5      5  (7.61525 51.96736)
## 6      6 (7.615148 51.96745)
## # … with 3,323 more rows
## #
## # Edge Data: 4,667 x 5
##    from    to highway                                       geometry edgeID
##   <int> <int> <fct>                                 <LINESTRING [°]>  <int>
## 1     1     2 service           (7.63275 51.9603, 7.631843 51.96061)      1
## 2     3     4 secondary       (7.614156 51.96724, 7.613797 51.96723)      2
## 3     5     6 cycleway  (7.61525 51.96736, 7.615165 51.96744, 7.615…      3
## # … with 4,664 more rows

Step 8: Putting it together

To make the approach more convenient, we can combine all steps above
into a single function, that takes a cleaned sf object with
LINESTRING geometries as input, and returns a spatial tbl_graph.

<span class="n">sf_to_tidygraph</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">directed</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
  
  </span><span class="n">edges</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="o">%>%</span><span class="w">
    </span><span class="n">mutate</span><span class="p">(</span><span class="n">edgeID</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="n">n</span><span class="p">()))</span><span class="w">
  
  </span><span class="n">nodes</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">edges</span><span class="w"> </span><span class="o">%>%</span><span class="w">
    </span><span class="n">st_coordinates</span><span class="p">()</span><span class="w"> </span><span class="o">%>%</span><span class="w">
    </span><span class="n">as_tibble</span><span class="p">()</span><span class="w"> </span><span class="o">%>%</span><span class="w">
    </span><span class="n">rename</span><span class="p">(</span><span class="n">edgeID</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">L1</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
    </span><span class="n">group_by</span><span class="p">(</span><span class="n">edgeID</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
    </span><span class="n">slice</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="n">n</span><span class="p">()))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
    </span><span class="n">ungroup</span><span class="p">()</span><span class="w"> </span><span class="o">%>%</span><span class="w">
    </span><span class="n">mutate</span><span class="p">(</span><span class="n">start_end</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">rep</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="s1">'start'</span><span class="p">,</span><span class="w"> </span><span class="s1">'end'</span><span class="p">),</span><span class="w"> </span><span class="n">times</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">n</span><span class="p">()</span><span class="o">/</span><span class="m">2</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
    </span><span class="n">mutate</span><span class="p">(</span><span class="n">xy</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">paste</span><span class="p">(</span><span class="n">.</span><span class="o">$</span><span class="n">X</span><span class="p">,</span><span class="w"> </span><span class="n">.</span><span class="o">$</span><span class="n">Y</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
    </span><span class="n">mutate</span><span class="p">(</span><span class="n">nodeID</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">group_indices</span><span class="p">(</span><span class="n">.</span><span class="p">,</span><span class="w"> </span><span class="n">factor</span><span class="p">(</span><span class="n">xy</span><span class="p">,</span><span class="w"> </span><span class="n">levels</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">unique</span><span class="p">(</span><span class="n">xy</span><span class="p">))))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
    </span><span class="n">select</span><span class="p">(</span><span class="o">-</span><span class="n">xy</span><span class="p">)</span><span class="w">
  
  </span><span class="n">source_nodes</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">nodes</span><span class="w"> </span><span class="o">%>%</span><span class="w">
    </span><span class="n">filter</span><span class="p">(</span><span class="n">start_end</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s1">'start'</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
    </span><span class="n">pull</span><span class="p">(</span><span class="n">nodeID</span><span class="p">)</span><span class="w">

  </span><span class="n">target_nodes</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">nodes</span><span class="w"> </span><span class="o">%>%</span><span class="w">
    </span><span class="n">filter</span><span class="p">(</span><span class="n">start_end</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s1">'end'</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
    </span><span class="n">pull</span><span class="p">(</span><span class="n">nodeID</span><span class="p">)</span><span class="w">

  </span><span class="n">edges</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">edges</span><span class="w"> </span><span class="o">%>%</span><span class="w">
    </span><span class="n">mutate</span><span class="p">(</span><span class="n">from</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">source_nodes</span><span class="p">,</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">target_nodes</span><span class="p">)</span><span class="w">
  
  </span><span class="n">nodes</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">nodes</span><span class="w"> </span><span class="o">%>%</span><span class="w">
    </span><span class="n">distinct</span><span class="p">(</span><span class="n">nodeID</span><span class="p">,</span><span class="w"> </span><span class="n">.keep_all</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
    </span><span class="n">select</span><span class="p">(</span><span class="o">-</span><span class="nf">c</span><span class="p">(</span><span class="n">edgeID</span><span class="p">,</span><span class="w"> </span><span class="n">start_end</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
    </span><span class="n">st_as_sf</span><span class="p">(</span><span class="n">coords</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s1">'X'</span><span class="p">,</span><span class="w"> </span><span class="s1">'Y'</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
    </span><span class="n">st_set_crs</span><span class="p">(</span><span class="n">st_crs</span><span class="p">(</span><span class="n">edges</span><span class="p">))</span><span class="w">
  
  </span><span class="n">tbl_graph</span><span class="p">(</span><span class="n">nodes</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">nodes</span><span class="p">,</span><span class="w"> </span><span class="n">edges</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">as_tibble</span><span class="p">(</span><span class="n">edges</span><span class="p">),</span><span class="w"> </span><span class="n">directed</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">directed</span><span class="p">)</span><span class="w">
  
</span><span class="p">}</span><span class="w">

</span><span class="n">sf_to_tidygraph</span><span class="p">(</span><span class="n">muenster_center</span><span class="p">,</span><span class="w"> </span><span class="n">directed</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">)</span><span class="w">
</span>
## # A tbl_graph: 3329 nodes and 4667 edges
## #
## # An undirected multigraph with 34 components
## #
## # Node Data: 3,329 x 2 (active)
##   nodeID            geometry
##    <int>         <POINT [°]>
## 1      1   (7.63275 51.9603)
## 2      2 (7.631843 51.96061)
## 3      3 (7.614156 51.96724)
## 4      4 (7.613797 51.96723)
## 5      5  (7.61525 51.96736)
## 6      6 (7.615148 51.96745)
## # … with 3,323 more rows
## #
## # Edge Data: 4,667 x 5
##    from    to highway                                       geometry edgeID
##   <int> <int> <fct>                                 <LINESTRING [°]>  <int>
## 1     1     2 service           (7.63275 51.9603, 7.631843 51.96061)      1
## 2     3     4 secondary       (7.614156 51.96724, 7.613797 51.96723)      2
## 3     5     6 cycleway  (7.61525 51.96736, 7.615165 51.96744, 7.615…      3
## # … with 4,664 more rows

Combining the best of both worlds

Having the network stored in the tbl_graph structure, with a geometry
list column for both the edges and nodes, enables us to combine the wide
range of functionalities in sf and tidygraph, in a way that fits neatly
into the tidyverse.

With the activate() verb, we specify if we want to manipulate the
edges or the nodes. Then, most dplyr verbs can be used in the familiar
way, also when directly applied to the geometry list column. For
example, we can add a variable describing the length of each edge,
which, later, we use as a weight for the edges.

<span class="n">graph</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">graph</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">activate</span><span class="p">(</span><span class="n">edges</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">mutate</span><span class="p">(</span><span class="n">length</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">st_length</span><span class="p">(</span><span class="n">geometry</span><span class="p">))</span><span class="w">

</span><span class="n">graph</span><span class="w">
</span>
## # A tbl_graph: 3329 nodes and 4667 edges
## #
## # An undirected multigraph with 34 components
## #
## # Edge Data: 4,667 x 6 (active)
##    from    to highway                              geometry edgeID   length
##   <int> <int> <fct>                        <LINESTRING [°]>  <int>      [m]
## 1     1     2 service  (7.63275 51.9603, 7.631843 51.96061)      1 71.2778…
## 2     3     4 seconda… (7.614156 51.96724, 7.613797 51.967…      2 24.7146…
## 3     5     6 cycleway (7.61525 51.96736, 7.615165 51.9674…      3 12.2303…
## 4     7     8 footway  (7.629304 51.96712, 7.629304 51.967…      4 20.0122…
## 5     9    10 steps    (7.627696 51.96534, 7.62765 51.9653…      5  3.2926…
## 6    11    12 service  (7.633612 51.96548, 7.633578 51.965…      6  7.4291…
## # … with 4,661 more rows
## #
## # Node Data: 3,329 x 2
##   nodeID            geometry
##    <int>         <POINT [°]>
## 1      1   (7.63275 51.9603)
## 2      2 (7.631843 51.96061)
## 3      3 (7.614156 51.96724)
## # … with 3,326 more rows

With one flow of pipes, we can ‘escape’ the graph structure, turn either
the edges or nodes back into real sf objects, and, for example,
summarise the data based on a specific variable.

<span class="n">graph</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">activate</span><span class="p">(</span><span class="n">edges</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">as_tibble</span><span class="p">()</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">st_as_sf</span><span class="p">()</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">group_by</span><span class="p">(</span><span class="n">highway</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">summarise</span><span class="p">(</span><span class="n">length</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">sum</span><span class="p">(</span><span class="n">length</span><span class="p">))</span><span class="w">
</span>
## Simple feature collection with 17 features and 2 fields
## geometry type:  MULTILINESTRING
## dimension:      XY
## bbox:           xmin: 7.603762 ymin: 51.94823 xmax: 7.641418 ymax: 51.97241
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
## # A tibble: 17 x 3
##    highway         length                                          geometry
##  * <fct>              [m]                             <MULTILINESTRING [°]>
##  1 corridor        9.377… ((7.620506 51.96262, 7.620542 51.96266), (7.6205…
##  2 cycleway    19275.917… ((7.619683 51.95395, 7.619641 51.95378, 7.619559…
##  3 footway     42822.070… ((7.640529 51.95325, 7.640528 51.95323, 7.640531…
##  4 path         8193.341… ((7.624007 51.95379, 7.624223 51.95378, 7.624253…
##  5 pedestrian  11399.337… ((7.620362 51.95471, 7.620477 51.9547), (7.62012…
##  6 primary      3574.842… ((7.625556 51.95272, 7.625594 51.95284, 7.625714…
##  7 primary_li…   184.385… ((7.617285 51.96609, 7.617286 51.96624, 7.617295…
##  8 residential 22729.748… ((7.614509 51.95351, 7.614554 51.95346), (7.6230…
##  9 secondary    4460.683… ((7.629784 51.95446, 7.629842 51.95444), (7.6312…
## 10 secondary_…   160.708… ((7.635309 51.95946, 7.635705 51.95948), (7.6349…
## 11 service     27027.822… ((7.624803 51.95393, 7.625072 51.95393), (7.6158…
## 12 steps        1321.841… ((7.634423 51.9546, 7.634438 51.95462), (7.61430…
## 13 tertiary     4356.365… ((7.607112 51.94991, 7.607126 51.94992, 7.607183…
## 14 tertiary_l…    43.856… ((7.623592 51.96612, 7.623568 51.96611, 7.623468…
## 15 track         389.866… ((7.610671 51.95778, 7.610571 51.95759, 7.610585…
## 16 unclassifi…   612.816… ((7.634492 51.95613, 7.634689 51.95611), (7.6343…
## 17 <NA>         3161.883… ((7.634374 51.95579, 7.634545 51.95575, 7.634662…

Switching back to sf objects is useful as well when plotting the
network, in a way that preserves its spatial properties.

<span class="n">ggplot</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">geom_sf</span><span class="p">(</span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">graph</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">activate</span><span class="p">(</span><span class="n">edges</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">as_tibble</span><span class="p">()</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">st_as_sf</span><span class="p">())</span><span class="w"> </span><span class="o">+</span><span class="w"> 
  </span><span class="n">geom_sf</span><span class="p">(</span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">graph</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">activate</span><span class="p">(</span><span class="n">nodes</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">as_tibble</span><span class="p">()</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">st_as_sf</span><span class="p">(),</span><span class="w"> </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0.5</span><span class="p">)</span><span class="w">
</span>

Or, alternatively, in only a few lines of code, plot the network as an
interactive map. On this page, the interactive map might show as an
image, but here, you
should be able to really interact with it!

<span class="n">tmap_mode</span><span class="p">(</span><span class="s1">'view'</span><span class="p">)</span><span class="w">

</span><span class="n">tm_shape</span><span class="p">(</span><span class="n">graph</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">activate</span><span class="p">(</span><span class="n">edges</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">as_tibble</span><span class="p">()</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">st_as_sf</span><span class="p">())</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">tm_lines</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">tm_shape</span><span class="p">(</span><span class="n">graph</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">activate</span><span class="p">(</span><span class="n">nodes</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">as_tibble</span><span class="p">()</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">st_as_sf</span><span class="p">())</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">tm_dots</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">tmap_options</span><span class="p">(</span><span class="n">basemaps</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'OpenStreetMap'</span><span class="p">)</span><span class="w">
</span>

All nice and well, but these are not things that we necessarily need the
graph representation for. The added value of tidygraph, is that it opens
the door to the functions of the igraph library, all specifically
designed for network analysis, and enables us to use them inside a
‘tidy’ workflow. To cover them all, we would need to write a book,
but let’s at least show a few examples below.

Centrality measures

Centraltity measures describe the importances of nodes in the network.
The simplest of those measures is the degree centrality: the number of
edges connected to a node. Another example is the betweenness
centrality, which, simply stated, is the number of shortest paths that
pass through a node. In tidygraph, we can calculate these and many other
centrality measures, and simply add them as a variable to the nodes.

The betweenness centrality can also be calculated for edges. In that
case, it specifies the number of shortest paths that pass through an
edge.

<span class="n">graph</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">graph</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">activate</span><span class="p">(</span><span class="n">nodes</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">mutate</span><span class="p">(</span><span class="n">degree</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">centrality_degree</span><span class="p">())</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">mutate</span><span class="p">(</span><span class="n">betweenness</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">centrality_betweenness</span><span class="p">(</span><span class="n">weights</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">length</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">activate</span><span class="p">(</span><span class="n">edges</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">mutate</span><span class="p">(</span><span class="n">betweenness</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">centrality_edge_betweenness</span><span class="p">(</span><span class="n">weights</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">length</span><span class="p">))</span><span class="w">

</span><span class="n">graph</span><span class="w">
</span>
## # A tbl_graph: 3329 nodes and 4667 edges
## #
## # An undirected multigraph with 34 components
## #
## # Edge Data: 4,667 x 7 (active)
##    from    to highway                  geometry edgeID   length betweenness
##   <int> <int> <fct>            <LINESTRING [°]>  <int>      [m]       <dbl>
## 1     1     2 service (7.63275 51.9603, 7.6318…      1 71.2778…       90271
## 2     3     4 second… (7.614156 51.96724, 7.61…      2 24.7146…       32946
## 3     5     6 cyclew… (7.61525 51.96736, 7.615…      3 12.2303…       12409
## 4     7     8 footway (7.629304 51.96712, 7.62…      4 20.0122…       22043
## 5     9    10 steps   (7.627696 51.96534, 7.62…      5  3.2926…       83380
## 6    11    12 service (7.633612 51.96548, 7.63…      6  7.4291…       17119
## # … with 4,661 more rows
## #
## # Node Data: 3,329 x 4
##   nodeID            geometry degree betweenness
##    <int>         <POINT [°]>  <dbl>       <dbl>
## 1      1   (7.63275 51.9603)      3      117041
## 2      2 (7.631843 51.96061)      3       94899
## 3      3 (7.614156 51.96724)      3       32881
## # … with 3,326 more rows
<span class="n">ggplot</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">geom_sf</span><span class="p">(</span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">graph</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">activate</span><span class="p">(</span><span class="n">edges</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">as_tibble</span><span class="p">()</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">st_as_sf</span><span class="p">(),</span><span class="w"> </span><span class="n">col</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'grey50'</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> 
  </span><span class="n">geom_sf</span><span class="p">(</span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">graph</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">activate</span><span class="p">(</span><span class="n">nodes</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">as_tibble</span><span class="p">()</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">st_as_sf</span><span class="p">(),</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">col</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">betweenness</span><span class="p">,</span><span class="w"> </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">betweenness</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">scale_colour_viridis_c</span><span class="p">(</span><span class="n">option</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'inferno'</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">scale_size_continuous</span><span class="p">(</span><span class="n">range</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="m">4</span><span class="p">))</span><span class="w">
</span>

<span class="n">ggplot</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">geom_sf</span><span class="p">(</span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">graph</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">activate</span><span class="p">(</span><span class="n">edges</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">as_tibble</span><span class="p">()</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">st_as_sf</span><span class="p">(),</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">col</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">betweenness</span><span class="p">,</span><span class="w"> </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">betweenness</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">scale_colour_viridis_c</span><span class="p">(</span><span class="n">option</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'inferno'</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">scale_size_continuous</span><span class="p">(</span><span class="n">range</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="m">4</span><span class="p">))</span><span class="w">
</span>

Shortest paths

A core part of spatial network analysis is generally finding the path
between two nodes that minimizes either the travel distance or travel
time. In igraph, there are several functions that can be used for this
purpose, and since a tbl_graph is just a subclass of an igraph
object, we can directly input it into every function in the igraph
package.

The function distances, for example, returns a numeric matrix
containing the distances of the shortest paths between every possible
combination of nodes. It will automatically choose a suitable algorithm
to calculate these shortest paths.

<span class="n">distances</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">distances</span><span class="p">(</span><span class="w">
  </span><span class="n">graph</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">graph</span><span class="p">,</span><span class="w">
  </span><span class="n">weights</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">graph</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">activate</span><span class="p">(</span><span class="n">edges</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">pull</span><span class="p">(</span><span class="n">length</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">

</span><span class="n">distances</span><span class="p">[</span><span class="m">1</span><span class="o">:</span><span class="m">5</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="m">5</span><span class="p">]</span><span class="w">
</span>
##            [,1]       [,2]       [,3]      [,4]       [,5]
## [1,]    0.00000   71.27789 1670.22046 1694.9351 1596.80669
## [2,]   71.27789    0.00000 1619.80567 1644.5203 1546.39190
## [3,] 1670.22046 1619.80567    0.00000   24.7146   77.64829
## [4,] 1694.93506 1644.52027   24.71460    0.0000  102.36289
## [5,] 1596.80669 1546.39190   77.64829  102.3629    0.00000

The function ‘shortest_paths’ not only returns distances, but also the
indices of the nodes and edges that make up the path. When we relate
them to their corresponding geometry columns, we get the spatial
representation of the shortest paths. Instead of doing this for all
possible combinations of nodes, we can specify from and to which nodes
we want to calculate the shortest paths. Here, we will show an example
of a shortest path from one node to another, but it is just as well
possible to do the same for one to many, many to one, or many to many
nodes. Whenever the graph is weighted, the Dijkstra algoritm will be
used under the hood. Note here that we have to define the desired output
beforehand: vpath means that only the nodes (called vertices in
igraph) are returned, epath means that only the edges are returned,
and both returns them both.

<span class="n">from_node</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">graph</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">activate</span><span class="p">(</span><span class="n">nodes</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">filter</span><span class="p">(</span><span class="n">nodeID</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">3044</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">pull</span><span class="p">(</span><span class="n">nodeID</span><span class=&qu...

To leave a comment for the author, please follow the link and comment on their blog: r-spatial.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)