# Improved net stacked distribution graphs via ggplot2 trickery

September 13, 2012
By

(This article was first published on Statisfactions: The Sounds of Data and Whimsy » R, and kindly contributed to R-bloggers)

Net stacked distribution graphs are a nice way of comparing data on a Likert scale (i.e. when respondents are asked whether they “Strongly Disagree”, “Disagree”, etc. with a statement). It strips out the neutral responses and centers the responses around the center of the graph so you can quickly compare agreement and disagreement on different issues. Here we’ll learn how to do this in ggplot2 — it takes a dosage of deviousness.

Jason Becker provides some code for doing this — I’ve taken his basic idea and made it more readable and flexible, including being able map multiple questions at the same time.

`net_stacked()`, code below, takes a single argument, `x`, which is a `data.frame` where each column is an ordered factor containing Likert-style responses. The factor levels must be ordered from the “most negative” possible response (e.g. “Strongly Disagree”) to “most positive” (e.g. “Strongly Agree”). If there is an odd number of possible responses/levels, such as in a 5 or 7 point Likert scale, `net_stacked` chops out the central level (assumed to be “Neutral”, “Neither Agree nor Disagree”, or similar).

All the columns of the `data.frame` need to have the same levels. The function can actually accept a `list` where the factor elements have different lengths, as well. `NA`s are omitted from each column before plotting.

How do we actually accomplish this effect in ggplot2? Here’s the full text of the function:

 ```net_stacked <- function(x) {   ## x: a data.frame or list, where each column is a ordered factor with the same levels ## lower levels are presumed to be "negative" responses; middle value presumed to be neutral ## returns a ggplot2 object of a net stacked distribution plot   ## Test that all elements of x have the same levels, are ordered, etc. all_levels <- levels(x[[1]]) n <- length(all_levels) levelscheck <- all(sapply(x, function(y) all(c(is.ordered(y), levels(y) == all_levels)) )) if(!levelscheck) stop("All levels of x must be ordered factors with the same levels")   ## Reverse order of columns (to make ggplot2 output look right after coord_flip) x <- x[length(x):1]   ## Identify middle and "negative" levels if(n %% 2 == 1) neutral <- all_levels[ceiling(n/2)] else neutral <- NULL   negatives <- all_levels[1:floor(n/2)] positives <- setdiff(all_levels, c(negatives, neutral))   ## remove neutral, summarize as proportion listall <- lapply(names(x), function(y) { column <- (na.omit(x[[y]])) out <- data.frame(Question = y, prop.table(table(column))) names(out) <- c("Question", "Response", "Freq")   if(!is.null(neutral)) out <- out[out\$Response != neutral,]   out })   dfall <- do.call(rbind, listall)   ## split by positive/negative pos <- dfall[dfall\$Response %in% positives,] neg <- dfall[dfall\$Response %in% negatives,]   ## Negate the frequencies of negative responses, reverse order neg\$Freq <- -neg\$Freq neg\$Response <- ordered(neg\$Response, levels = rev(levels(neg\$Response)))   stackedchart <- ggplot() + aes(Question, Freq, fill = Response, order = Response) + geom_bar(data = neg, stat = "identity") + geom_bar(data = pos, stat = "identity") + geom_hline(yintercept=0) + scale_y_continuous(name = "", labels = paste0(seq(-100, 100, 20), "%"), limits = c(-1, 1), breaks = seq(-1, 1, .2)) + scale_fill_discrete(limits = c(negatives, positives)) + coord_flip()   stackedchart }```

Once we have the function, here’s the code for the image above:

 ```require(ggplot2)   ## generate fake likert data set.seed(200) response_scale <- c("Strongly Disagree", "Disagree", "Neither Agree or Disagree", "Agree", "Strongly Agree") x <- replicate(5, ordered(sample(response_scale, 20, replace = TRUE), levels = response_scale), simplify = F) x <- as.data.frame(x) names(x) <- paste0("Q", 1:5)   ## plot it as net stacked distribution net_stacked(x)```

This gives a warning, since `ggplot2` really isn’t sure why we’re stacking negative numbers. But that is, in fact, what we’re intending to do here: embrace the devious!

Jason Becker’s post provides some colors to heuristically represent the intensity of feelings. These and any other customizations we can add onto the `ggplot` object returned by our function in the usual ways

Most of the function is simply preparing and summarizing the data in the form of proportions for each level of the ordered factor, applied to each column of the `data.frame`; but notice that we separate the “positive” (more-agreeing) and “negative” (more-disagreeing) levels into two separate objects:

 ```## split by positive/negative pos <- dfall[dfall\$Response %in% positives,] neg <- dfall[dfall\$Response %in% negatives,]```

And then we make the frequencies negative because we want them to actually show up on the negative side of 0 in our plot:

 `neg\$Freq <- -neg\$Freq`

And we reorder the levels in reverse, because we want them oriented so that the “most neutral” responses are stacked first on top of zero in the negative direction and then progressively “more negative” responses:

 `neg\$Response <- ordered(neg\$Response, levels = rev(levels(neg\$Response)))`

And here’s where we bring that home — in the plot command, we actually have two different layers. One represents the positive half, and one the negative half, which are drawing on these separate datasets. We need to separate them, otherwise ggplot2 will get confused stacking positives and negatives together.

 ```geom_bar(data = neg, stat = "identity") + geom_bar(data = pos, stat = "identity") +```

This is the clever trick that Jason Becker does that makes this whole thing possible!

Also, notice that in specifying the mapping, we explicitly tell `ggplot2` to order the levels by `Response` (a column containing the text of each Likert-type response in an ordered factor):

 `aes(Question, Freq, fill = Response, order = Response)`

This is important because the negative side won’t be in the right order if we don’t do this explicitly and AFTER reversing the order of the negative levels to fan out away from zero.

Then we flip to make the whole thing horizontal with `coord_flip()`. `coord_flip()` makes later columns in the data appear on top, which isn’t what we want here, which is why earlier in the functon I simply reverse the order of the elements in the input data:

 `x <- x[length(x):1]`

Happy net ranked distribution visualizing!

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...