**Robin Lovelace - R**, and kindly contributed to R-bloggers)

Version 0.1.1 of the package stplanr has been released on CRAN. This is a major update with many new functions and a new class definition, `SpatialLinesNetwork`

, for route planning and network analysis using igraph.

This short post, by myself and package co-author Richard Ellison, describes how stplanr can be used for transport research with a few simple examples from the package documentation. We hope that stplanr is of use to transport researchers and practitioners worldwide and encourage contributions to the development version hosted on GitHub.

## Working with origin-destination data

Origin-destination (OD) data is one of the basic data sources for understanding travel behaviour. Usually OD data in R is represented by a table containing at least the following columns:

- Origin ID: a text string identifying the zone in which journeys originate
- Destination ID: a test string identifying the destination zone
- Number of trips: the rate of travel between the unique OD pair

Additional columns can provide a break-down by trip type such as by mode of travel (e.g. car) and time of day. A sample of this data (also referred to as ‘Flow data’ by some statistical organsiations) is provided in the example dataset `flows`

, as illustrated in the Table below.

```
library(stplanr)
## Loading required package: sp
library(tmap)
data("flow")
knitr::kable(flow[1:3,c(1, 2, 3, 13)])
```

Area.of.residence | Area.of.workplace | All | On.foot | |
---|---|---|---|---|

920573 | E02002361 | E02002361 | 109 | 59 |

920575 | E02002361 | E02002363 | 38 | 4 |

920578 | E02002361 | E02002367 | 10 | 1 |

To link this data to geographical space we use a dataset stored as a `SpatialPointsDataFrame`

from the sp package in `cents`

:

```
data(cents)
plot(cents)
```

To link the flow data we can use the command `od2line()`

to create `SpatialLinesDataFrame`

:

```
odlines <- od2line(flow = flow, zones = cents)
plot(cents)
plot(odlines, add = TRUE)
```

Note that the function also accepts a `SpatialPolygonsDataFrame`

as an input by setting the line start and end point to the zone’s geographic centroid:

```
odlines <- od2line(flow = flow, zones = zones)
plot(zones)
plot(odlines, add = TRUE)
```

To gain a basic understanding of the rate of travel in this simple travel system, we can plot the `odlines`

with width proportional to the number of people travelling:

```
plot(odlines, lwd = odlines$All / mean(odlines$All) * 3, col = "red")
plot(odlines, lwd = odlines$On.foot / mean(odlines$All) * 3, col = "green", add = T)
```

In the resulting plot the total rate of travel is represented by the width of red lines. The proportion of people who walk is illustrated by the relationship between the width of the green and red lines. We can use this data to explore the relationship between walking and distance:

```
odlines <- spTransform(odlines, CRS("+init=epsg:27700"))
odlines$dist <- rgeos::gLength(odlines, byid = T)
plot(odlines$dist, odlines$On.foot / odlines$All)
# fit a model to the curve
m <- lm(On.foot / All ~ dist, [email protected])
lines(odlines$dist, m$fitted.values)
```

```
summary(m)
##
## Call:
## lm(formula = On.foot/All ~ dist, data = [email protected])
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.26915 -0.06987 -0.00694 0.06190 0.63195
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.365e-01 4.503e-02 11.915 8.36e-16 ***
## dist -1.409e-04 2.501e-05 -5.633 9.64e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1585 on 47 degrees of freedom
## Multiple R-squared: 0.403, Adjusted R-squared: 0.3903
## F-statistic: 31.73 on 1 and 47 DF, p-value: 9.638e-07
```

This is useful information: we can see a clear negative relationship between the distance of the trip (in metres) and the proportion who are willing to make the journey on foot.

## Working with route-allocated ‘flow’ data

stplanr includes functions for allocating OD pairs to the transport network, including `route_cyclestreet()`

, `route_graphhopper()`

and, most rececently `viaroute()`

which provides an R interface to the superfast OSRM routing API. This is useful because roads rarely take you directly from origin to destination, as illustrated below for the trip from Leeds to London one could take to to attend the upcoming GISRUK conference:

```
route <- route_cyclestreet("Leeds", "Greenwich")
library(tmap)
tiles <- read_osm(bb(route, ext = 2))
tm_shape(tiles) +
tm_raster() +
tm_shape(route) +
tm_lines()
```

We can allocate all of the OD pairs in `odlines`

to the transport network using these functions. The `routes_fast`

dataset, for example, was created using `line2route()`

and represents the rastest route that a cyclist may take, according to the CycleStreets.net API. A sample of this dataset is illustrated below:

```
routes_fast$weight <- c(5, 10)
plot(routes_fast[1:2,], lwd = routes_fast$weight)
```

Note that there is some overlap between the two lines above. It is sometimes useful to take aggregate statistics for the attributes of overlapping lines, for example to estimate the number of people using any particular part of the transport network. This can be acheived using Barry Rowlingson’s function `overline()`

:

```
rnet <- overline(routes_fast[1:2,], attrib = "weight", fun = sum)
```

Note that in the above plot the final segment to the east has a `weight`

value that is the sum of the two overlapping lines in`routes_fast[1:2,]`

: 5 + 10 = 15. We can verify this with Barry’s neat function

```
plot(rnet, lwd = rnet$weight, col = "red")
lineLabels(rnet, "weight")
```

## Other functions

There are many other functions designed to help transport researchers in `stplanr`

. These include:

`read_stats19*`

functions which import and format UK ‘Stats19’ road traffic casualty data`calc_catchment*`

functions for calculating transport ‘catchment areas’ using buffers around transport facilities`gtfs2sldf()`

for reading-in Google’s GTFS format into R`toptail*`

functions for removing the beginning and ends of`SpatialLines`

objects

The use of the `calc_catchment*`

functions can be illustrated using some simple data from Sydney showing the potential catchment of a possible separated cycle paths. First we import the data that we want to use:

```
library(rgdal)
## rgdal: version: 1.1-3, (SVN revision 594)
## Geospatial Data Abstraction Library extensions to R successfully loaded
## Loaded GDAL runtime: GDAL 1.11.2, released 2015/02/10
## Path to GDAL shared files: /usr/share/gdal/1.11
## Loaded PROJ.4 runtime: Rel. 4.8.0, 6 March 2012, [PJ_VERSION: 480]
## Path to PROJ.4 shared files: (autodetected)
## Linking to sp version: 1.2-2
data_dir <- system.file("extdata", package = "stplanr")
unzip(file.path(data_dir, 'smallsa1.zip'))
unzip(file.path(data_dir, 'testcycleway.zip'))
sa1income <- readOGR(".","smallsa1") # Import some population data
## OGR data source with driver: ESRI Shapefile
## Source: ".", layer: "smallsa1"
## with 638 features
## It has 19 fields
testcycleway <- readOGR(".","testcycleway") # Import the path of the cycleways to test
## OGR data source with driver: ESRI Shapefile
## Source: ".", layer: "testcycleway"
## with 2 features
## It has 2 fields
```

We can then use our population data and the path of the cycleways to estimate the population catchment for a given distance. If our population layer contains fields with multiple subsets of data for which we want to calculate the catchment area (e.g., men, women and children), we can calculate the individual catchments. For this example, we will simply use the ‘Total’ field containing the total population:

```
cycle_catchment <- calc_catchment(
polygonlayer = sa1income, # The SpatialPolygonsDataFrame containing the population data
targetlayer = testcycleway, # The Spatial* object containing the transport infrastructure of interest
calccols = c('Total'), # The columns to summarise
distance = 500, # The desired distance,
projection = 'austalbers', # The projection to use for calculating the area
dissolve = TRUE # Collapse all the population zones into a single polygon for the catchment
)
cycle_catchment$Total # Print the total catchment population
## [1] 23944.32
```

We can also plot the catchment area and the cycle paths. You will notice that in this example, there are gaps in the buffers. These gaps are because of the gaps in the population layer where Sydney harbour passes through the area. To take into account the road network and not simply straight-line distance, we can use the `calc_network_catchment`

function.

```
plot(cycle_catchment)
plot(testcycleway, col="red", add=TRUE, lwd=2)
```

The toptail functionality is useful for removing the beginning and ends of SpatialLines, both for improving aestetchics of plots and for ensuring that lines do not overlap. This functionality is illustrated below using the `routes_fast`

data.

```
proj4string(routes_fast) <- CRS("+init=epsg:4326")
rf_toptailed <- toptail(routes_fast, toptail_dist = 300)
plot(routes_fast, col = "red", lwd = 5)
plot(rf_toptailed, add = T)
```

The package vignette contains some further illustrations of `stplanr`

’s functions which we plan to improve on over time. While become almost ‘industry standard’ in fields such as diverse as genetics, astronomy and epidemiology, R has received limited attention in transport planning. We believe that there is great potential for R, via new packages such as stplanr, to help solve real world transport problems such as estimating the geographical distribution of cycling potential.

The ‘sustainable’ in the package name relates to the emphasis on low-carbon modes in the package such as cycling and public transport. There is a huge amount of work to be done to plan for a transition away from fossil fuels in the sector, for health andenvironmental reasons. In this context we hope that software such as `stplanr`

contributes to the evidence base needed to design better transport systems.

**leave a comment**for the author, please follow the link and comment on their blog:

**Robin Lovelace - R**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...