Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In preparation for using some of our streamgraphs for production (PDF/print) graphics, I ended up having to hand-edit labels in on one of the graphics in an Adobe product. This bumped up the priority on adding annotation functions to the streamgraph package (you really don’t want to have to hand-edit graphics if at all possible, trust me). To illustrate them, I’ll use unemployment data that I started gathering for a course I’m teaching in the Fall.

library(dplyr) library(streamgraph) library(pbapply)   url <- "http://www.bls.gov/lau/ststdsadata.txt" dat <- readLines(url)

This data is not exactly in a happy format (hit the URL in your browser and you’ll see what I mean). It was definitely made for line printers/human consumption and I feel bad for any human that has to stare at it. The function I’m using to extract data is not necessarily what I’d do to just read in the whole data, but it’s more for teaching something else than optimization. It’ll do for our purposes here:

get_state_data <- function(state) {   section <- paste("^%s| (", paste0(month.name, sep="", collapse="|"), ") +[[:digit:]]{4}", sep="", collapse="") section <- sprintf(section, state) vals <- gsub("^ +| +$", "", grep(section, dat, value=TRUE)) state_vals <- gsub("^.* \.+", "", vals[seq(from=2, to=length(vals), by=2)]) cols <- read.table(text=state_vals) cols$month <- as.Date(sprintf("01 %s", vals[seq(from=1, to=length(vals), by=2)]), format="%d %B %Y") cols\$state <- state   cols %>% select(8:9, 1:8) %>% mutate(V1=as.numeric(gsub(",", "", V1)), V2=as.numeric(gsub(",", "", V2)), V4=as.numeric(gsub(",", "", V4)), V6=as.numeric(gsub(",", "", V6)), V3=V3/100, V5=V5/100, V7=V7/100) %>% rename(civ_pop=V1, labor_force=V2, labor_force_pct=V3, employed=V4, employed_pct=V5, unemployed=V6, unemployed_pct=V7)   }   state_unemployment <- bind_rows(pblapply(state.name, get_state_data))

This will give us a data frame for employment(/unemployment) rates for all the (US) states. I only wanted to focus on New England and a few others for the course example, so this bit filters out them out:

state_unemployment %>% filter(state %in% c("California", "Ohio", "Rhode Island", "Maine", "Massachusetts", "Connecticut", "Vermont", "New Hampshire", "Nebraska")) -> some

With that setup out of the way, let me introduce the two new functions: sg_add_marker and sg_annotate. sg_add_marker adds a vertical, dotted line that spans the height of the graph and is placed at the designated spot on the x axis. You can add an optional label for the marker by specifying the y position, label text, color, size, space away from the line and how it’s aligned – start (left), center (middle), right (end). This is primarily useful for placing the label on either side of the line.

sg_annotate is for adding text anywhere on the streamgraph. The original use for it was to label streams, but you can use it any way you think would add meaning to your streamgraph. You can see them both in action below, where I plot the streamgraph for unemployment (%) for the selected states, then label the start of each recession since 1980 (with the peak national unemployment rate) with a marker and also label each stream:

streamgraph(some, "state", "unemployed_pct", "month") %>% sg_axis_x(tick_interval=10, tick_units = "year", tick_format="%Y") %>% sg_axis_y(0) %>% sg_add_marker(x=as.Date("1981-07-01"), "1981 (10.8%)", anchor="end") %>% sg_add_marker(x=as.Date("1990-07-01"), "1990 (7.8%)", anchor="start") %>% sg_add_marker(x=as.Date("2001-03-01"), "2001 (6.3%)", anchor="end") %>% sg_add_marker(x=as.Date("2007-12-01"), "2007 (10.1%)", anchor="end") %>% sg_annotate(label="Vermont", x=as.Date("1978-04-01"), y=0.6, color="#ffffff") %>% sg_annotate(label="Maine", x=as.Date("1978-03-01"), y=0.30, color="#ffffff") %>% sg_annotate(label="Nebraska", x=as.Date("1977-06-01"), y=0.41, color="#ffffff") %>% sg_annotate(label="Massachusetts", x=as.Date("1977-06-01"), y=0.36, color="#ffffff") %>% sg_annotate(label="New Hampshire", x=as.Date("1978-03-01"), y=0.435, color="#ffffff") %>% sg_annotate(label="California", x=as.Date("1978-02-01"), y=0.175, color="#ffffff") %>% sg_annotate(label="Rhode Island", x=as.Date("1977-11-01"), y=0.55, color="#ffffff") %>% sg_annotate(label="Ohio", x=as.Date("1978-06-01"), y=0.485, color="#ffffff") %>% sg_annotate(label="Connecticut", x=as.Date("1978-01-01"), y=0.235, color="#ffffff") %>% sg_fill_tableau() %>% sg_legend(show=TRUE)

Selected State Unemployment Figures Since 1976

I probably could have positioned the annotations a bit better, but this should be a good enough example to get the general idea. I may add an option to place the marker vertical lines behind streamgraph and will be adding some toggle options to the interactive version (to hide/show markers and/or annotations).

As usual, the package is up on github and a contiguous copy of the above snippets are in this gist.

Three final notes. First, I suggest enabling the y axis when you’re trying to figure out where the y position for a label should be (since the y axis range is calculated by the summed span of the data). Second, the x axis works with both dates and continuous values, but you need to match what you setup the streamgraph with. Finally, just a tip: I’ve found SVG Crowbar 2 to be super-helpful when I need to extract these streamgraphs out for non-interactive reproduction. Just yank the SVG out with it and hand it (or a converted form of it) to whomever is handling final production and they should be able to work with it.