**ikan bilis » R**, and kindly contributed to R-bloggers)

Recently, I’ve been playing with patterns of drought in paleo records of streamflow. One of the earliest and most helpful tools I’ve developed, identifies and characterizes droughts in extremely long time series using R. I’m still hacking my way through it, but this is what has been cooking thus far…

For this example I’ll be using the Meko et al. 2007 tree-ring reconstruction of the Colorado River’s annual natural flow at Lees Ferry, AZ (extending from A.D. 762-2005). You can check out the data yourself. I’ll be loading it in an R example below.

The heart of this relies on `FindDrought`, a function that returns a simple binary mask to determine whether a particular year is a “drought year” or not. Here, I’m defining drought as years of below-mean flow, broken by two or more years of above average flow.

FindDrought <- function(x, averageflow=mean(x)) { # Determines whether a given time period is in a drought, given a record. # # Args: # x: A vector of annual stream flow observations. # averageflow: If you want to specify a flow average for finding droughts. # # Returns: # A vector indicating whether drought present (TRUE) or not present (FALSE). foo <- rep(TRUE, length(x)) foo[x >= averageflow] <- FALSE runs <- rle(foo) runs$values[runs$lengths < 2 & runs$values == FALSE] <- TRUE return(inverse.rle(runs)) }

Having defined what years are and are not droughts, we can then use this information in the `DefineDrought`. This function loops through and collects basic characteristics about each drought event, returning each drought’s start year, stop year, duration, and “intensity” (the sum of anomalies).

DefineDrought <- function(x, y, averageflow=mean(y)) { # Determines basic characteristics of droughts in a given flow time series. # # Args: # x: Year of time series observation, in a vector format. # y: Time series or vector of river flow observations. # averageflow: If you want to specify a flow average for finding droughts. # # Returns: # A data frame with columns giving: # 1) The starting period of a drought - "start". # 2) The period a drought stops - "stop". # 3) The number of periods that the drought was present - "duration". # 4) The sum of anomalies for the drought period - "anom.sum". drought.present <- FindDrought(y, averageflow = averageflow) start <- c() stop <- c() duration <- c() anom.sum <- c() temp.start <- FALSE temp.duration <- 0 temp.anom.sum <- 0 # There is likely a more elegant way to handle this aside form loops. for (obs in 1:length(y)) { if (drought.present[obs] == FALSE) { temp.start <- FALSE temp.duration <- 0 temp.anom.sum <- 0 } else if (drought.present[obs] == TRUE) { if (!temp.start) temp.start <- x[obs] temp.duration <- temp.duration + 1 temp.anom.sum <- temp.anom.sum + (y[obs] - averageflow) if ((drought.present[obs + 1] == FALSE) | (obs + 1 > length(y))) { start <- append(start, temp.start) stop <- append(stop, x[obs]) duration <- append(duration, temp.duration) anom.sum <- append(anom.sum, temp.anom.sum) } } } return(data.frame(start, stop, duration, anom.sum)) }

If this tugs at your bobber, you can put this to use in a simple interactive session:

lees <- read.table('ftp://ftp.ncdc.noaa.gov/pub/data/paleo/treering/reconstructions/northamerica/usa/upper-colorado-flow2007.txt', header = TRUE, skip = 158, # To skip the meta data. na.strings = 'NULL') # Drum-roll, please. lees.droughts <- DefineDrought(x = lees$Year, y = lees$RecMAF)

Here, `DefineDrought` returns a simple dataframe with the structure:

'data.frame': 169 obs. of 4 variables: $ start : int 762 770 774 779 786 790 797 805 812 822 ... $ stop : int 767 771 774 781 786 794 799 809 819 830 ... $ duration: num 6 2 1 3 1 5 3 5 8 9 ... $ anom.sum: num -2.51 -1.81 -2.33 1.14 -5.89 ...

You’ll note that this is very sensitive to the slightest decline in streamflow. You are able to tweak `DefineDrought`‘s sensitivity by manually specifying a mean flow value and passing this as an argument. However, especially in exploratory analysis, I’ve found it best to simply sort or filter from this default list so that we can see the entire range of values, even in single year “droughts”.

There you have it! Beats sifting through a time series and manually defining droughts by hand. My next post will (hopefully) be a continuation of this, showing how I’ve used this information to visualize drought onset and impact with `ggplot2`.

As always, I’m happy to hear questions, comments, corrections or concerns!

**EDIT:** Updated the FindDrought function based on Cameron Bracken‘s sagely wisdom. Many thanks for introducing me to `rle()`!

Tagged: analysis, climate, dendro, drought, paleo, R, streamflow

**leave a comment**for the author, please follow the link and comment on their blog:

**ikan bilis » R**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...