Plot Multiple Time Series using the flow / inkblot / river / ribbon / volcano / hourglass / area / whatchamacallit plots ~ blue whale catch per country w/ ggplot2

June 27, 2010

(This article was first published on mind of a Markov chain » R, and kindly contributed to R-bloggers)

Ever since I first looked at this NYT visualization by Amanda Cox, I’ve always wanted to reproduce this in R. This is a plot that stacks multiple time series onto one another, with the width of the river/ribbon/hourglass representing the strength at each time. The NYT article used box office revenue as the width of the river. It’s also an interactive web app. thanks to some help from graphic designers.

AFAIK, ggplot2 can stack area plots using geom_area or create flow plots for one set of data using geom_ribbon, but not both. So I created a function that creates the necessary transformed data to use in geom_polygon.

I used blue whale catch data from Masaaki Ishida to illustrate my function. The location of the river along the y-axis is centered around the mean at each time. The data is also smoothed over so it looks nicer.

Some links that may be helpful:

(messy) R Code:

# data: Masaaki Ishida ([email protected])

head(blue, 2)
##      Season Norway U.K. Japan Panama Denmark Germany U.S.A. Netherlands
## ## [1,]   1931      0 6050     0      0       0       0      0           0
## ## [2,]   1932  10128 8496     0      0       0       0      0           0
## ##      U.S.S.R. South.Africa TOTAL
## ## [1,]        0            0  6050
## ## [2,]        0            0 18624

hourglass.plot <- function(df) {
  stack.df <- df[,-1]
  stack.df <- stack.df[,sort(colnames(stack.df))]
  stack.df <- apply(stack.df, 1, cumsum)
  stack.df <- apply(stack.df, 1, function(x) sapply(x, cumsum))
  stack.df <- t(apply(stack.df, 1, function(x) x - mean(x)))
  # use this for actual data
  ##  coords.df <- data.frame(x = rep(c(df[,1], rev(df[,1])), times = dim(stack.df)[2]), y = c(apply(stack.df, 1, min), as.numeric(apply(stack.df, 2, function(x) c(rev(x),x)))[1:(length(df[,1])*length(colnames(stack.df))*2-length(df[,1]))]), id = rep(colnames(stack.df), each = 2*length(df[,1])))

  ##  qplot(x = x, y = y, data = coords.df, geom = "polygon", color = I("white"), fill = id)

  # use this for smoothed data
  density.df <- apply(stack.df, 2, function(x) spline(x = df[,1], y = x))
  id.df <- sort(rep(colnames(stack.df), each = as.numeric(lapply(density.df, function(x) length(x$x)))))
  density.df <-"rbind", lapply(density.df,
  density.df <- data.frame(density.df, id = id.df)
  smooth.df <- data.frame(x = unlist(tapply(density.df$x, density.df$id, function(x) c(x, rev(x)))), y = c(apply(unstack(density.df[,2:3]), 1, min), unlist(tapply(density.df$y, density.df$id, function(x) c(rev(x), x)))[1:(table(density.df$id)[1]+2*max(cumsum(table(density.df$id))[-dim(stack.df)[2]]))]), id = rep(names(table(density.df$id)), each = 2*table(density.df$id)))

  qplot(x = x, y = y, data = smooth.df, geom = "polygon", color = I("white"), fill = id)

hourglass.plot(blue[,-12]) + opts(title = c("Blue Whale Catch"))

Filed under: ggplot2, R, Whaling

To leave a comment for the author, please follow the link and comment on their blog: mind of a Markov chain » R. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , ,

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)