# Using {sf} to calculate catchment areas

**R tips – NHS-R Community**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

**By Tom Jemmett**

R is a highly capable GIS tool, thanks to the `{sf}`

pacakge. In this post I am going to show you how you can use `{sf}`

with openly available data to calculate catchment areas and population sizes, and then how to calculate travel time isochrone’s with `{osrm}`

.

knitr::opts_chunk$set(echo = TRUE) suppressPackageStartupMessages({ library(tidyverse) library(readxl) library(glue, include.only = "glue_data") library(scales, exclude = c("discard", "col_factor")) library(sf) library(osrm) library(leaflet) library(leaflet.extras) })

Firstly, I have prepared a dataset containing the 23 A&E departments that are Adult Major Trauma Centre’s (MTC’s) in England and Wales. This dataset is an `{sf}`

object which contains a special column called `geometry`

– this column contains the locations of the hospitals. The data is sourced from this, the Organisation Data Service (ODS) and the NHS Postcode Database.

When we print this table out we get a bit of information telling us how many features (rows) and fields (columns) there are, along with what the type of the geometry is (in this case, we have points). We also get the “bounding box” (bbox) – this is like a rectangle that encloses all of our geometries. Finally, we get the coordinate reference system (CRS) that is being used.

Below this we see a table of data – like any other dataframe in R.

major_trauma_centres <- read_sf( paste0( "https://gist.githubusercontent.com/tomjemmett/", "ac111a045e1b0a60854d3d2c0fb0baa8/raw/", "a1a0fb359d1bc764353e8d55c9d804f47a95bfe4/", "major_trauma_centres.geojson" ), agr = "identity" ) major_trauma_centres ## Simple feature collection with 23 features and 2 fields ## Attribute-geometry relationship: 0 constant, 0 aggregate, 2 identity ## geometry type: POINT ## dimension: XY ## bbox: xmin: -4.113684 ymin: 50.41672 xmax: 0.1391145 ymax: 54.98021 ## geographic CRS: WGS 84 ## # A tibble: 23 x 3 ## name org_id geometry ## <chr> <chr> <POINT [°]> ## 1 ADDENBROOKE'S HOSPITAL RGT01 (0.1391145 52.17374) ## 2 DERRIFORD HOSPITAL RK950 (-4.113684 50.41672) ## 3 HULL ROYAL INFIRMARY RWA01 (-0.3581464 53.74411) ## 4 JAMES COOK UNIVERSITY HOSPITAL RTDCR (-1.214805 54.55176) ## 5 JOHN RADCLIFFE HOSPITAL RTH08 (-1.219806 51.76387) ## 6 KING'S COLLEGE HOSPITAL (DENMARK HILL) RJZ01 (-0.09391574 51.46808) ## 7 LEEDS GENERAL INFIRMARY RR801 (-1.551744 53.80145) ## 8 NORTHERN GENERAL HOSPITAL RHQNG (-1.455966 53.4098) ## 9 NOTTINGHAM UNIVERSITY HOSPITALS NHS TRUST - Q~ RX1RA (-1.185957 52.9438) ## 10 QUEEN ELIZABETH HOSPITAL RRK02 (-1.938476 52.45318) ## # ... with 13 more rows

This dataframe can be used like any other dataframe in R, for example, we can use `dplyr`

to filter rows out we aren’t interested in, or join to another dataframe to bring in extra information.

For example, we can load the LSOA population estimates and join it to the LSOA boundaries file from the ONS geoportal. If you aren’t familiar with Output Areas see the ONS Census Geography documentation.

First, let’s load the population estimate file.

# download the LSOA population estimates from ons website: for some reason they # provide this download as an excel file in a zip file, so we need to download # the zip then extract the file, but we don't need to keep the zip after. withr # handles this temporary like file for us if (!file.exists("SAPE22DT2-mid-2019-lsoa-syoa-estimates-unformatted.xlsx")) { withr::local_file("lsoa_pop_est.zip", { download.file( paste0( "https://www.ons.gov.uk/file?uri=/peoplepopulationandcommunity/", "populationandmigration/populationestimates/datasets/", "lowersuperoutputareamidyearpopulationestimates/mid2019sape22dt2/", "sape22dt2mid2019lsoasyoaestimatesunformatted.zip" ), "lsoa_pop_est.zip", mode = "wb" ) unzip("lsoa_pop_est.zip") }) } lsoa_pop_estimates <- read_excel( "SAPE22DT2-mid-2019-lsoa-syoa-estimates-unformatted.xlsx", "Mid-2019 Persons", skip = 3 ) %>% select(LSOA11CD = `LSOA Code`, pop = `All Ages`) head(lsoa_pop_estimates) ## # A tibble: 6 x 2 ## LSOA11CD pop ## <chr> <dbl> ## 1 E01011949 1954 ## 2 E01011950 1257 ## 3 E01011951 1209 ## 4 E01011952 1740 ## 5 E01011953 2033 ## 6 E01011954 2210

Now we can load the boundary data and join to our population estimates.

lsoa_boundaries <- read_sf( "https://opendata.arcgis.com/datasets/e9d10c36ebed4ff3865c4389c2c98827_0.geojson" ) %>% # there are other fields in the lsoa data, but we only need the LSOA11CD field select(LSOA11CD) %>% inner_join(lsoa_pop_estimates, by = "LSOA11CD") %>% # this helps when combining different sf objects and get's rid of some messages st_set_agr(c(LSOA11CD = "identity", pop = "aggregate"))

We can now use ggplot to create a simple plot showing the populations of the LSOA’s and add points to show the MTC’s.

ggplot() + # first plot the lsoa boundaries and colour by the population geom_sf(data = lsoa_boundaries, aes(fill = pop), colour = NA) + geom_sf(data = major_trauma_centres) + scale_fill_distiller(type = "div", palette = "Spectral", # tidy up labels in the legend, display as thousands labels = number_format(accuracy = 1e-1, scale = 1e-3, suffix = "k")) + theme_void() + theme(legend.position = c(0, 0.98), legend.justification = c(0, 1)) + labs(fill = "Population (thousands)")

We can now try to calculate the population estimates for each of the MTC’s. We can take advantage of {sf} to perform a spatial join between our two datasets. A spatial join is similar to a join in a regular Sql database, except instead of joining on columns we join on geospatial features.

In this case we want to take each LSOA and find which MTC it is closest to, so we can use the `st_nearest_feature`

predicate function with `st_join`

. This will give us a dataset containing all of the rows from `lsoa_boundaries`

, but augmented with the columns from `major_trauma_centres`

.

We can then use standard `{dplyr}`

functions to group by the MTC `name`

and `org_id`

fields, and calculate the sum of the `pop`

column.

`{sf}`

is clever – it knows that when we are summarising a geospatial table we need to combine the geometries somehow. What it will do is call `st_union`

and return us a single geometry per MTC.

One thing we need to do before joining our data though is to transform our data temporarily to a different CRS. Our data currently is using latitude and longitude in a spherical projection, but `st_nearest_points`

assumes that the points are planar. That is, it assumes that the world is flat. This can lead to incorrect results.

But, we can transform the data to use the British National Grid. This instead specifies points as how far east and north from an origin. This has a CRS number of 27700. Once we are done summarising our data we can again project back to the previous CRS (4326). This article explains these numbers in a bit more detail.

mtcv_pop <- lsoa_boundaries %>% st_transform(crs = 27700) %>% # assign each LSOA to a single, closest, MTC st_join( st_transform(major_trauma_centres, crs = 27700), join = st_nearest_feature ) %>% # now, summarise our data frame based on MTC's to find the total population group_by(name, org_id) %>% # note: summarise will automatically call st_union on the geometries # this gives us the lsoa's as a combined multipolygon summarise(across(pop, sum), .groups = "drop") %>% st_transform(crs = 4326)

We can now visualise this map. This time we will use the `{leaflet}`

package to create an interactive html map.

First we need a function for colouring the areas.

pal_fun <- colorNumeric("Spectral", NULL, n = 7, reverse = TRUE)

As this is an interactive map we can add popups to the area’s to make it easier to identify which MTC we are looking at and to show us the estimated population.

p_popup <- glue_data(mtcv_pop, "<strong>{name}</strong> ({org_id})", "Estimated Population: {comma(pop)}", .sep = "\n")

Now we can create our map

mtcv_pop %>% leaflet() %>% # add the areas that have been assigned to each MTC addPolygons(stroke = TRUE, color = "black", opacity = 1, weight = 1, fillColor = ~pal_fun(pop), fillOpacity = 1, popup = p_popup) %>% # add markers for the MTC's addCircleMarkers(data = major_trauma_centres, color = "black", stroke = TRUE, fillColor = "white", weight = 1, radius = 3, opacity = 1, fillOpacity = 1) %>% setMapWidgetStyle(list(background= "white"))

Click here to view an interactive version of this map.

This method certainly is not perfect. For example, we can see that the closest MTC for most of Somerset is Cardiff, but in order to reach Cardiff you need to cross the Bristol Channel! Not so easy by a Land Ambulance!

A better approach would be to use some method to calculate travel time from each LSOA to the MTC’s. Using the {osrm} package, we can get isochrone’s from Open Street Map. An isochrone is an area of equal travel time from a point. So, if we calculate isochrone’s for each of the MTC’s, then join to the LSOA’s, we can more accurately assign our LSOA’s to the MTC’s.

This code will calculate the isochrone’s for our MTC’s, in 30 minute travel time increments, from 0 to 150 minutes. This is calculated as per usual car travel times, and not for ambulance travel times. Note, this can take a few minutes to run.

mtc_iso <- major_trauma_centres %>% # we need to calculate the isochrone's for each row individually, so we need # to map over the geometries mutate(iso = map(geometry, osrmIsochrone, breaks = seq(0, 150, by = 30), returnclass = "sf")) %>% # we are going to replace the existing geometry column (the MTC location) with # the isochrones st_drop_geometry() %>% unnest(iso) %>% st_set_geometry("geometry") %>% # transform to a planar projection to make calculations more reliable st_transform(crs = 27700)

Now we can do the join between our LSOA’s and the MTC isochrones. Potentially an LSOA may have more than one isochrone for two or more MTC’s. In these cases we need to select the closest isochrone. The first part of the code above handles this by filtering the data by the minimum value for the max column (maximum time of the isochrone).

However, this may not return unique results; if an LSOA is matched by two isochrones of the same time for two different MTC’s then our filter will find both! In these cases we select the closest MTC based on straight line distance. This can be quite slow, so to reduce the computational burden I split my data into two groups: those that have more than 1 match (which we apply the closest distance filter to) and those that have just 1 match.

Once we have done this filtering we can recombine the data.

lsoa_tmp <- local({ x <- lsoa_boundaries %>% st_transform(crs = 27700) %>% st_join(mtc_iso) %>% group_by(LSOA11CD) %>% filter(max == min(max)) %>% mutate(nx = n() > 1) %>% ungroup() %>% group_nest(nx) # splitting this up reduces the amount of computation required: just work out # the distance and filter where we have more than 1 match x$data[[which(x$nx)]] <- x$data[[which(x$nx)]] %>% inner_join(major_trauma_centres %>% st_transform(crs = 27700) %>% as_tibble() %>% rename(hospital = geometry), by = c("name", "org_id")) %>% mutate(dist = map2_dbl(st_centroid(geometry), hospital, st_distance)) %>% group_by(LSOA11CD) %>% filter(dist == min(dist)) %>% select(-dist, -hospital) # recombine our data x <- bind_rows(x$data) # some lsoa's are missed out in the steps above, add these back in by copying # the data from their closest neighbour in x y <- lsoa_boundaries %>% st_transform(crs = 27700) %>% anti_join(st_drop_geometry(x), by = "LSOA11CD") %>% st_join(select(x, -LSOA11CD), join = st_nearest_feature) # join x and y together bind_rows(x, y) })

Now that we have assigned the LSOA’s to a MTC and isochrone we can summarise our data to give us the combined isochrones.

mtc_iso_fixed <- lsoa_tmp %>% group_by(name, org_id, max) %>% summarise(.groups = "drop") %>% st_transform(4326) %>% mutate(time = glue::glue("{max - 30} to {max} mins") %>% fct_reorder(max))

And we can summarise the MTC’s to find the revised catchment populations.

mtc_points <- major_trauma_centres %>% inner_join(lsoa_tmp %>% st_drop_geometry() %>% group_by(name, org_id) %>% summarise(mean_time = sum(center * pop) / sum(pop), across(pop, sum), .groups = "drop"), by = c("name", "org_id"))

Finally we can put this together into a map.

mtc_times <- levels(mtc_iso_fixed$time) mtc_pal <- colorFactor(viridis::viridis(length(mtc_times)), mtc_times) leaflet(mtc_iso_fixed) %>% addTiles() %>% addPolygons(fillColor = ~mtc_pal(time), color = "#000000", weight = .5, fillOpacity = 0.5, popup = ~glue::glue("<b>{name}</b><br>{time}")) %>% addMarkers(data = mtc_points, popup = ~glue::glue("<b>{name}</b> ({org_id})", "Mean Time: {round(mean_time, 0)} minutes", "Pop: {scales::comma(pop)}", .sep = "<br>"))

Click here to view an interactive version of this map.

We could improve this by making the isochrones be smaller, for example 15 minute groups rather than 30 minutes. However, this would take longer to download from Open Streetmap, and it will take a lot longer to do the LSOA assingment. Though, having smaller groups would mean we need more colours which would make the map harder to read!

I hope that you have found this useful! If you want to learn more about geocomputation in R then you should read the book “Geocomputation with R” which is available to read free online.

The post Using {sf} to calculate catchment areas appeared first on NHS-R Community.

**leave a comment**for the author, please follow the link and comment on their blog:

**R tips – NHS-R Community**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.