Identifying and characterizing droughts with R

[This article was first published on ikan bilis » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Recently, I’ve been playing with patterns of drought in paleo records of streamflow. One of the earliest and most helpful tools I’ve developed, identifies and characterizes droughts in extremely long time series using R. I’m still hacking my way through it, but this is what has been cooking thus far…

For this example I’ll be using the Meko et al. 2007 tree-ring reconstruction of the Colorado River’s annual natural flow at Lees Ferry, AZ (extending from A.D. 762-2005). You can check out the data yourself. I’ll be loading it in an R example below.

The heart of this relies on FindDrought, a function that returns a simple binary mask to determine whether a particular year is a “drought year” or not. Here, I’m defining drought as years of below-mean flow, broken by two or more years of above average flow.

FindDrought <- function(x, averageflow=mean(x)) {
  # Determines whether a given time period is in a drought, given a record.
  #
  # Args:
  #   x: A vector of annual stream flow observations.
  #   averageflow: If you want to specify a flow average for finding droughts.
  #
  # Returns:
  #   A vector indicating whether drought present (TRUE) or not present (FALSE).
  foo <- rep(TRUE, length(x))
  foo[x >= averageflow] <- FALSE
  runs <- rle(foo)
  runs$values[runs$lengths < 2 & runs$values == FALSE] <- TRUE
  return(inverse.rle(runs))
}

Having defined what years are and are not droughts, we can then use this information in the DefineDrought. This function loops through and collects basic characteristics about each drought event, returning each drought’s start year, stop year, duration, and “intensity” (the sum of anomalies).

DefineDrought <- function(x, y, averageflow=mean(y)) {
  # Determines basic characteristics of droughts in a given flow time series.
  #
  # Args:
  #   x: Year of time series observation, in a vector format.
  #   y: Time series or vector of river flow observations.
  #   averageflow: If you want to specify a flow average for finding droughts.
  #
  # Returns:
  #   A data frame with columns giving:
  #   1) The starting period of a drought - "start".
  #   2) The period a drought stops - "stop".
  #   3) The number of periods that the drought was present - "duration".
  #   4) The sum of anomalies for the drought period - "anom.sum".
  drought.present <- FindDrought(y, averageflow = averageflow)
  start <- c()
  stop <- c()
  duration <- c()
  anom.sum <- c()
  temp.start <- FALSE
  temp.duration <- 0
  temp.anom.sum <- 0
  # There is likely a more elegant way to handle this aside form loops.
  for (obs in 1:length(y)) {
    if (drought.present[obs] == FALSE) {
      temp.start <- FALSE
      temp.duration <- 0
      temp.anom.sum <- 0
    }
    else if (drought.present[obs] == TRUE) {
      if (!temp.start)
        temp.start <- x[obs]
      temp.duration <- temp.duration + 1
      temp.anom.sum <- temp.anom.sum + (y[obs] - averageflow)
      if ((drought.present[obs + 1] == FALSE) | (obs + 1 > length(y))) {
        start <- append(start, temp.start)
        stop <- append(stop, x[obs])
        duration <- append(duration, temp.duration)
        anom.sum <- append(anom.sum, temp.anom.sum)
      }
    }
  }
  return(data.frame(start, stop, duration, anom.sum))
}

If this tugs at your bobber, you can put this to use in a simple interactive session:

lees <- read.table('ftp://ftp.ncdc.noaa.gov/pub/data/paleo/treering/reconstructions/northamerica/usa/upper-colorado-flow2007.txt',
  header = TRUE,
  skip = 158,  # To skip the meta data.
  na.strings = 'NULL')

# Drum-roll, please.
lees.droughts <- DefineDrought(x = lees$Year, y = lees$RecMAF)

Here, DefineDrought returns a simple dataframe with the structure:

'data.frame':	169 obs. of  4 variables:
 $ start   : int  762 770 774 779 786 790 797 805 812 822 ...
 $ stop    : int  767 771 774 781 786 794 799 809 819 830 ...
 $ duration: num  6 2 1 3 1 5 3 5 8 9 ...
 $ anom.sum: num  -2.51 -1.81 -2.33 1.14 -5.89 ...

You’ll note that this is very sensitive to the slightest decline in streamflow. You are able to tweak DefineDrought‘s sensitivity by manually specifying a mean flow value and passing this as an argument. However, especially in exploratory analysis, I’ve found it best to simply sort or filter from this default list so that we can see the entire range of values, even in single year “droughts”.

There you have it! Beats sifting through a time series and manually defining droughts by hand. My next post will (hopefully) be a continuation of this, showing how I’ve used this information to visualize drought onset and impact with ggplot2.

As always, I’m happy to hear questions, comments, corrections or concerns!

EDIT: Updated the FindDrought function based on Cameron Bracken‘s sagely wisdom. Many thanks for introducing me to rle()!


Tagged: analysis, climate, dendro, drought, paleo, R, streamflow

To leave a comment for the author, please follow the link and comment on their blog: ikan bilis » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)