Analysing soil moisture data in NetCDF format with the ncdf4 library

[This article was first published on The Devil is in the Data, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The netCDF format is popular in sciences that analyse sequential spatial data. It is a self-describing, machine-independent data format for creating, accessing and sharing array-oriented information. The netCDF format provides spatial time-series such as meteorological or environmental data. This article shows how to visualise and analyse this data format by reviewing soil moisture data published by the Australian Bureau of Statistics.

Soil Moisture data

The Australian Bureau of Meteorology publishes hydrological data in both a simple map grid and in the NetCDF format. The map grid consists of a flat text file that requires a bit of data jujitsu before it can be used. The NetCDF format is much easier to deploy as it provides a three-dimensional matrix of spatial data over time.

We are looking at the possible relationship between sewer main blockages and deep soil moisture levels. You will need to manually download this dataset from the Bureau of Meteorology website. I have not been able to scrape the website automatically. For this analysis, I use the actual deep soil moisture level, aggregated monthly in NetCDF 4 format.

Soil moisture data from the Australian Bureau of meteorology in netCDF format

Reading, Extracting and Transforming the netCDF format

The ncdf4 library, developed by David W. Pierce, provides the necessary functionality to manage this data. The first step is to load the data, extract the relevant information and transform the data for visualisation and analysis. When the data is read, it essentially forms a complex list that contains the metadata and the measurements.

The ncvar_get function extracts the data from the list. The lon, lat and dates variables are the dimensions of the moisture data. The time data is stored as the number of days since 1 January 1900. The spatial coordinates are stored in decimal degrees with 0.05-decimal degree intervals. The moisture data is a three-dimensional matrix with longitue, latitude and time as dimensions. Storing this data in this way will make it very easy to use.

library(ncdf4)
# Load data
bom <- nc_open("Hydroinformatics/SoilMoisture/sd_pct_Actual_month.nc")
print(bom) # Inspect the data

# Extract data
lon <- ncvar_get(bom, "longitude")
lat <- ncvar_get(bom, "latitude")
dates <- as.Date("1900-01-01") + ncvar_get(bom, "time")
moisture <- ncvar_get(bom, "sd_pct")
dimnames(moisture) <- list(lon, lat, dates)

Visualising the data

The first step is to check the overall data. This first code snippet extracts a matrix from the cube for 31 July 2017 and plots it. This code pipe extracts the date for the end of July 2017 and creates a data frame which is passed to ggplot for visualisation. Although I use the Tidyverse, I still need reshape2 because the gather function does not like matrices.

library(tidyverse)
library(RColorBrewer)
library(reshape2)

d <- "2017-07-31"
m <- moisture[, , which(dates == d)] %>%
       melt(varnames = c("lon", "lat")) %>%
       subset(!is.na(value))

ggplot(m, aes(x = lon, y = lat, fill = value)) + borders("world") + 
    geom_tile() + 
    scale_fill_gradientn(colors = brewer.pal(9, "Blues")) + 
    labs(title = "Total moisture in deep soil layer (100-500 cm)",
    subtitle = format(as.Date(d), "%d %B %Y")) + 
    xlim(range(lon)) + ylim(range(lat)) + coord_fixed()

Deep soil moisture: Source Bureau of Meteorology, Australia

With the ggmap package we can create a nice map of a local area.

library(ggmap)
loc <- round(geocode("Bendigo") / 0.05) * 0.05 
map_tile <- get_map(loc, zoom = 12, color = "bw") %>% 
    ggmap()

map_tile + 
    geom_tile(data = m, aes(x = lon, y = lat, fill = value), alpha = 0.8) + 
    scale_fill_gradientn(colors = brewer.pal(9, "Blues")) + 
    labs(title = "Total moisture in deep soil layer (100-500 cm)",
        subtitle = format(as.Date(d), "%d %B %Y"))

Analysing the data

For my analysis, I am interested in the time series of moisture data for a specific point on the map. The previous code slices the data horizontally over time. To create a time series we can pierce through the data for a specific coordinate. The purpose of this time series is to investigate the relationship between sewer main blockages and deep soil data, which can be a topic for a future post.

mt <- data.frame(date = dates, 
                 dp = moisture[as.character(loc$lon), as.character(loc$lat), ])
ggplot(mt, aes(x = date, y = dp)) + geom_line() + 
    labs(x = "Month",
         y = "Moisture",
         title = "Total moisture in deep soil layer (100-500 cm)",
         subtitle = paste(as.character(loc), collapse = ", "))

The latest version of this code is available on my GitHub repository.

Deep soil moisture time series.

The post Analysing soil moisture data in NetCDF format with the ncdf4 library appeared first on The Devil is in the Data.

To leave a comment for the author, please follow the link and comment on their blog: The Devil is in the Data.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)