Visualise and Analyse Bathymetric Survey Data with R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The first step in the value chain for water utilities is extracting water from natural sources, such as rivers, groundwater, oceans, or reservoirs. Water managers and the public alike are interested in knowing how much water a reservoir currently holds. This is not an easy problem to solve, as it requires some analysis to move from survey to volume measurement. This article shows how to visualise and analyse bathymetric survey data using the R language and data from the Prettyboy Reservoir in Maryland, USA.
The Prettyboy reservoir
While searching the web for a publicly available bathymetric survey, I stumbled across the Maryland Geological Survey (MGS) website. This organisation investigates the geology and water resources of the American state of Maryland. The Prettyboy Reservoir helps provide water for nearly two million residents in Baltimore City and parts of five neighbouring Maryland counties.
The MGS publishes bathymetric data for their reservoirs, including the Prettyboy reservoir in the county of Baltimore.
View Larger Map
Extract, Transfer and Load
The Maryland Geological Survey website hosts the raw data as a zipped CSV file.
The file has a double header, the first row contains the field names and the second row the units. The data has three columns:
| Easting | Northing | Depth |
|---|---|---|
| UTM-NAD83-Meters | UTM-NAD83-Meters | Feet |
| 352606.8 | 4386507 | 9.7 |
| 352606.8 | 4386507 | 8.8 |
| 352606.5 | 4386507 | 11.4 |
The Easting and Northing are in the Universal Transverse Mercator projection. As a demonstration of unit confusion, the reservoir's depth in feet. As usual in bathymetry surveys, depth is measured from the datum as a positive number. The zero datum is the reservoir's Full Supply Level.
The script below extracts, transforms and loads (ETL) the data. The file contains duplicates and a zero point at each of the four corners, which are removed from the data. This leaves us a clean file to work from.
# Bathymetry visualisation and analysis
# libraries
library(tidyverse)
library(RColorBrewer)
rm(list = ls())
# Data source
# http://www.mgs.md.gov/publications/data_pages/reservoir_bathymetry.html
if (!file.exists("data/PrettyBoy1998.dat")) {
download.file("http://www.mgs.md.gov/output/reservoir_bathy/PrettyBoy1998.zip",
destfile = "data/prettyboy-1998.zip")
unzip("data/prettyboy-1998.zip", exdir = "data")
file.remove("data/prettyboy-1998.zip")
}
# Read data
rawdata <- read.csv("data/PrettyBoy1998.dat", skip = 1)
fsl <- 145 # Full Supply Level
# Data cleaning
prettyboy <- rawdata |>
rename(easting = 1, northing = 2, depth_ft = 3) |>
filter(!duplicated(rawdata) & !(easting %in% range(easting))) |>
mutate(depth_m = depth_ft * 0.304)
Visualise the Data
This visualisation shows the lines that the survey vessel sailed to capture the sonar soundings. The edge of the reservoir was measured with a terrestrial survey and has a depth of zero.
The code defines a colour palette for this data using green and blue. The colours combine two palettes from the green and blue palettes so that only zero values are green and positive values are blue.
# Basic Visualisation
bathymetry_colours <- c(rev(brewer.pal(3, "Greens"))[-2:-3],
brewer.pal(9, "Blues")[-1:-3])
ggplot(prettyboy) +
aes(easting, northing, colour = depth_m) +
geom_point(size = .1) +
scale_x_continuous(labels = scales::label_comma()) +
scale_y_continuous(labels = scales::label_comma()) +
coord_equal() + labs(colour = "Depth [m]") +
scale_colour_gradientn(colors = bathymetry_colours) +
labs(title = "Prettyboy Reservoir Bathymetry (1988)",
caption = "Source: Maryland Geological Survey",
x = "Easting (UTM NAD83)", y = "Northing (UTM NAD83)") +
theme_minimal(base_size = 8)
Mapping the data
The coordinates are provided in the North American UTM / NAD83 projection. To map this survey data onto a map tile, we need to convert these to longitude and latitudes for the WGS84 spheroid.
The SF (Simple Features) package converts the coordinates for use in Google maps. To achieve this we need the CRS (Coordinate Reference System) with number 26918, as Maryland is in zone 18N: EPSG:26918 NAD83 / UTM zone 18N. The second CRS (4326) is the system used by Google maps: EPSG:4326 WGS 84.
We can now plot the data on a Google Maps tile using the ggmap package. This functionality requires a Google API key, which I am not sharing here. You can also use this data in other spatial systems, such as the interactive tooling in the Leaflet package.
# Spatial mapping
library(sf)
# Convert prettyboy tibble to an sf object
prettyboy_sf <- st_as_sf(prettyboy,
coords = c("easting", "northing"),
crs = 26918) # NAD83 / UTM zone 18N
# Transform to latitude/longitude (WGS84)
prettyboy_wgs <- st_transform(prettyboy_sf, crs = 4326)
# Convert back to tibble with lat/lon columns
coords <- st_coordinates(prettyboy_wgs)
prettyboy_wgs <- cbind(prettyboy_wgs, coords) |>
rename(lon = X, lat = Y) |>
st_drop_geometry()
library(ggmap)
## Register Google Maps API
api <- readLines("data/google-maps.api")
register_google(key = api)
prettyboy_map <- get_map(c(mean(prettyboy_wgs$lon),
mean(prettyboy_wgs$lat)),
zoom = 14, color = "bw")
ggmap(prettyboy_map, extent = "device") +
geom_point(data = as_tibble(prettyboy_wgs),
aes(lon, lat, col = depth_m), size = 0.1) +
scale_colour_gradientn(colors = bathymetry_colours, name = "Depth [m]") +
labs(title = "Prettyboy Reservoir Bathymetry (1988)",
caption = "Source: Maryland Geological Survey") +
theme_void(base_size = 10)
Analyse Bathymetric Survey: Full Supply Volume
Two methods are available to estimate the total volume of a reservoir using bathymetric data. The analogue method uses contour lines to draw cross-sections on graph paper at regular intervals. Then use a planimeter or the shoelace formula to measure the surface area of each section and calculate the total volume by multiplying the surface area by the spacing between sections. I used this method more than thirty years ago when working on dredging projects. But technology has moved on.
The second method is more numeric, and there are two methods: interpolate a grid or build a Triangular Irregular Network.
The bathymetry provides data only for the lines on the map, as visualised above. To estimate the full supply analytically, we can overlay the reservoir with a grid and interpolate the depth for each cell. We can then estimate the Full Supply Volume by summing the product of each cell area and depth for each grid item. When we have a grid with n elements, and x and y indicate the element size and di the depth of an element, the volume is calculated with:
$$V = \sum_{i=1}^{n} {x \times y \times d_i}$$
The akima package provides functionality for interpolation of irregularly and regularly spaced data, making it ideal for our problem. First, we set the spacing to 10 meters. The smaller the interval, the more accurate the estimation. But small spacing comes at a cost, as the interpolation will take some time to finalise.
Once we have a grid with a depth for each location, we can estimate the Full Supply Volume by multiplying each location's depth by its grid area and summing across all grids.
This analysis provides a volume of 71.50 × 106 m3, which is close to the 19 billion US gallons (71.92 × 106 m3) reported by the Maryland Geological Survey.
# Extrapolate gridded version
library(akima)
# Grid size [m]
dx <- 10
dy <- 10
# Regular grid coordinates
x_seq <- seq(min(prettyboy$easting), max(prettyboy$easting), by = dx)
y_seq <- seq(min(prettyboy$northing), max(prettyboy$northing), by = dy)
# Interpolate scattered (x, y, depth) -> regular grid
# no extrapolation outside convex hull
interp_res <- with(prettyboy,
interp(x = easting,
y = northing,
z = depth_m,
xo = x_seq,
yo = y_seq,
duplicate = "strip",
linear = TRUE,
extrap = FALSE))
# Volume
prettyboy_grid <- interp2xyz(interp_res, data.frame = TRUE) |>
as_tibble() |>
rename(easting = x, northing = y, depth_m = z)
grid_volume_m3 <- sum(prettyboy_grid$depth_m * dx * dy, na.rm = TRUE)
grid_volume_m3 / 100^3
We can use the interpolated values to create a nice map of the reservoir with contour lines and cross-sections. The cross sections are taken at four random points in the grid.
# Visualise Cross sections
set.seed(1234)
northing_sections <- sample(prettyboy_grid$northing, 4)
prettyboy_sections <- filter(prettyboy_grid, northing %in% northing_sections)
max_depth <- round(max(prettyboy_sections$depth_m, na.rm = TRUE))
map <- ggplot(filter(prettyboy_grid)) +
aes(easting, northing) +
geom_raster(aes(fill = depth_m), interpolate = TRUE) +
geom_contour(aes(z = depth_m), colour = "white", alpha = 0.4) +
geom_hline(yintercept = northing_sections, col = "orange") +
coord_equal() +
scale_x_continuous(labels = scales::label_comma()) +
scale_y_continuous(labels = scales::label_comma()) +
scale_fill_gradientn(
colours = c(NA, bathymetry_colours[-1]),
na.value = "transparent",
guide = guide_colourbar(reverse = TRUE),
name = "Depth [m]") +
theme_minimal(base_size = 28) +
labs(title = "Prettyboy Reservoir",
caption = "Source: Maryland Geological Survey",
x = "Easting [m]", y = "Northing [m]") +
theme(legend.position = "bottom",
legend.key.width = unit(2, 'cm'))
sections <- ggplot(prettyboy_sections) +
aes(easting, (fsl - depth_m)) +
geom_area(fill = bathymetry_colours[1], alpha = 0.5) +
scale_x_continuous(labels = scales::label_comma()) +
facet_wrap(~rev(northing), ncol = 1) +
coord_cartesian(ylim = c(fsl - 1.1 * max(prettyboy_sections$depth_m, na.rm = TRUE), fsl)) +
theme_minimal(base_size = 28) +
labs(x = "Easting [m]", y = "Elevation [m]")
library(gridExtra)
png(filename = "prettyboy-sections.png", width = 1920, height = 940, units = "px")
grid.arrange(map, sections, ncol = 2)
dev.off()
The second method for calculating the total volume is to use a Triangulated Irregular Network (TIN). A TIN is a representation of a surface consisting of triangular facets. The main difference with the interpolated values is that this model is irregular, so each triangle has a different projected area.
The geometry package implements Delaunay triangulation, where each row contains the indices of three points forming a triangle. In the next step, we use the map_dfr() function to create a data frame with the index number of each triangle, the three coordinates and the associated depths.
To measure the volume, we can use the shoelace formula to calculate the area of each triangle:
$$A_i = \frac{1}{2}| x_{i_1} (y_{1_2} - y_{i_3}) + x_{i_2} (y_{1_3} - y_{i_1}) + x_{i_3} (y_{1_1} - y_{i_2})|$$
The volume of the element is the area multiplied by the average depth:
$$V = \sum{A_i \frac{\sum d_i}{3}}$$
This method provides the same volume but is a bit faster to process because we are not creating new data.
# Triangulated Irregular Network
library(geometry)
library(purrr)
# 2D coordinates
coords2d <- as.matrix(prettyboy[, c("easting", "northing")])
# Delaunay triangulation
tri <- delaunayn(coords2d)
# Convert the triangles into a long data frame for ggplot
tin_df <- map_dfr(seq_len(nrow(tri)), function(i) {
idx <- tri[i, ]
data.frame(
tri_id = i,
easting = prettyboy$easting[idx],
northing = prettyboy$northing[idx],
depth_m = prettyboy$depth_m[idx])})
triangle_area_shoelace <- function(x, y) {
abs(x[1] * (y[2] - y[3]) +
x[2] * (y[3] - y[1]) +
x[3] * (y[1] - y[2])) / 2}
tin_volume <- tin_df |>
dplyr::group_by(tri_id) |>
dplyr::summarise(
area = triangle_area_shoelace(easting, northing),
mean_depth = mean(depth_m),
volume = area * mean_depth,
.groups = "drop")
tin_volume_m3 <- sum(tin_volume$volume)
tin_volume_m3 / 1e6
Generate Capacity Charts
Last but not least, we need a chart that converts the measured reservoir level to a volume. The easiest method to develop this curve is to use the grid. The TIN becomes mathematically problematic when not all three triangle points are underwater.
While bathymetry measures depth from the top down, with higher numbers indicating greater depth, reservoir levels are expressed in elevation. I could not find the official elevation of the Full Supply Level. Google Maps provides an elevation of 145 meters for the dam, so let's use this.
The measured values closely matched a cubic regression, so we can calculate the volume of the reservoir using the level with:
$$V(\ell) =-3.463 \times 10^{9} + 8.883 \times 10^{7}\,\ell - 7.640 \times 10^{5}\,\ell^{2} + 2.203 \times 10^{3}\,\ell^{3}$$
# Capacity Curve
# levels to evaluate
levels <- seq(fsl + 15 - round(max(prettyboy$depth_m) / 10) * 10, fsl, by = 1)
grid_capacity <- tibble(level = levels) |>
mutate(volume_m3 = map_dbl(level, \(L) {
bed <- fsl - prettyboy_grid$depth_m
depth_L <- pmax(0, L - bed)
sum(depth_L * dx * dy, na.rm = TRUE)}))
# Cubic regression
cubic <- lm(volume_m3 ~ poly(level, 3, raw = TRUE), data = grid_capacity)
grid_capacity$cubic = predict(cubic)
ggplot(grid_capacity) +
aes(level, volume_m3 / 1E3) +
geom_point(col = "grey") +
geom_line(aes(level, cubic / 1E3)) +
geom_hline(yintercept = grid_volume_m3 / 1E3, linetype = 6, col = "red") +
scale_y_continuous(labels = scales::label_comma()) +
theme_minimal() +
labs(title = "Prettyboy Reservoir Capacity Curve",
x = "Water level", y = expression(Volume~km^3))
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.