Make Refreshing Segmented Column Charts with {ggchicklet}

[This article was first published on R – rud.is, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The first U.S. Democratic debates of the 2020 election season were held over two nights this past week due to the daft number of candidates running for POTUS. The spiffy @NYTgraphics folks took the tallies of time spent blathering by each speaker/topic and made rounded rectangle segmented bar charts ordered by the time the blathering was performed (these aren’t really debates, they’re loosely-prepared for performances) which I have dubbed “chicklet” charts due to a vague resemblance to the semi-popular gum/candy.

You can see each day’s live, javascript-created NYTimes charts here:

and this is a PNG snapshot of one of them:

nytimes chicklet chart for debate day 1

I liked the chicklet aesthetic enough to make a new {ggplot2} geom_chicklet() to help folks make them. To save some blog bytes, you can read how to install the package over at https://cinc.rud.is/web/packages/ggchicklet/.

Making Chicklet Charts

Since the @NYTimes chose to use javascript to make their chart they also kinda made the data available (view the source of both of the aforelinked URLs) which I’ve wrangled a bit and put into the {ggchicklet} package. We’ll use it to progress from producing basic bars to crunching out chicklets and compare all the candidates across both days.

While the @Nytimes chart(s) provide a great deal of information, most media outlets focused on how much blather time each candidate managed to get. We do not need anything fancier than a bar chart or table to do that:

library(hrbrthemes)
library(ggchicklet)
library(tidyverse)

data("debates2019")

count(debates2019, speaker, wt=elapsed, sort=TRUE) %>%
  mutate(speaker = fct_reorder(speaker, n, sum, .desc=FALSE)) %>%
  mutate(speaker = fct_inorder(speaker) %>% fct_rev()) %>%
  ggplot(aes(speaker,n)) +
  geom_col() +
  scale_y_comma(position = "right") +
  coord_flip() +
  labs(x = NULL, y = "Minutes Spoken") +
  theme_ipsum_rc(grid="X")

ordered blather time chart for each spaker

If we want to see the same basic view but include how much time each speaker spent on each topic, we can also do that without much effort:

count(debates2019, speaker, topic, wt=elapsed, sort=TRUE) %>%
  mutate(speaker = fct_reorder(speaker, n, sum, .desc=FALSE)) %>%
  ggplot(aes(speaker, n , fill = topic)) +
  geom_col() +
  scale_y_comma(position = "right") +
  ggthemes::scale_fill_tableau("Tableau 20", name = NULL) +
  coord_flip() +
  labs(x = NULL, y = "Minutes Spoken") +
  theme_ipsum_rc(grid="X") +
  theme(legend.position = "bottom")

time spent per topic per speaker

By default geom_col() is going to use the fill aesthetic to group the bars and use the default sort order to stack them together.

We can also get a broken out view by not doing the count() and just letting the segments group together and use a white bar outline to keep them distinct:

debates2019 %>%
  mutate(speaker = fct_reorder(speaker, elapsed, sum, .desc=FALSE)) %>%
  ggplot(aes(speaker, elapsed, fill = topic)) +
  geom_col(color = "white") +
  scale_y_comma(position = "right") +
  ggthemes::scale_fill_tableau("Tableau 20", name = NULL) +
  coord_flip() +
  labs(x = NULL, y = "Minutes Spoken") +
  theme_ipsum_rc(grid="X") +
  theme(legend.position = "bottom")

grouped distinct topics

While I liked the rounded rectangle aesthetic, I also really liked how the @nytimes ordered the segments by when the topics occurred during the debate. For other types of chicklet charts you don’t need to grouping variable to be a time-y whime-y column, just try to use something that has a sane ordering characteristic to it:

debates2019 %>%
  mutate(speaker = fct_reorder(speaker, elapsed, sum, .desc=FALSE)) %>%
  ggplot(aes(speaker, elapsed, group = timestamp, fill = topic)) +
  geom_col(color = "white", position = position_stack(reverse = TRUE)) +
  scale_y_comma(position = "right") +
  ggthemes::scale_fill_tableau("Tableau 20", name = NULL) +
  coord_flip() +
  labs(x = NULL, y = "Minutes Spoken") +
  theme_ipsum_rc(grid="X") +
  theme(legend.position = "bottom")

distinct topics ordered by when blathered abt during each debate

That last chart is about as far as you could go to reproduce the @nytimes look-and-feel without jumping through some serious gg-hoops.

I had made a rounded rectangle hidden geom to make rounded-corder tiles for the {statebins} package so making a version of ggplot2::geom_col() (which I also added to {ggplot2}) was pretty straightforward. There are some key differences in the defaults of geom_chicklet():

  • a “white” stroke for the chicklet/segment (geom_col() has NA for the stroke)
  • automatic reversing of the group order (geom_col() uses the standard sort order)
  • radius setting of unit(3, "px") (change this as you need)
  • chicklet legend geom (b/c they aren’t bars or points)

You likely just want to see it in action, so here it is without further adieu:

debates2019 %>%
  mutate(speaker = fct_reorder(speaker, elapsed, sum, .desc=FALSE)) %>%
  ggplot(aes(speaker, elapsed, group = timestamp, fill = topic)) +
  geom_chicklet(width = 0.75) +
  scale_y_continuous(
    expand = c(0, 0.0625),
    position = "right",
    breaks = seq(0, 14, 2),
    labels = c(0, sprintf("%d min.", seq(2, 14, 2)))
  ) +
  ggthemes::scale_fill_tableau("Tableau 20", name = NULL) +
  coord_flip() +
  labs(
    x = NULL, y = NULL, fill = NULL,
    title = "How Long Each Candidate Spoke",
    subtitle = "Nights 1 & 2 of the June 2019 Democratic Debates",
    caption = "Each bar segment represents the length of a candidate’s response to a question.\n\nOriginals <https://www.nytimes.com/interactive/2019/admin/100000006581096.embedded.html?>\n<https://www.nytimes.com/interactive/2019/admin/100000006584572.embedded.html?>\nby @nytimes Weiyi Cai, Jason Kao, Jasmine C. Lee, Alicia Parlapiano and Jugal K. Patel\n\n#rstats reproduction by @hrbrmstr"
  ) +
  theme_ipsum_rc(grid="X") +
  theme(axis.text.x = element_text(color = "gray60", size = 10)) +
  theme(legend.position = "bottom")

chiclet chart with all the topics

Yes, I upped the ggplot2 tweaking a bit to get closer to the @nytimes (FWIW I like the Y gridlines, YMMV) but didn’t have to do much else to replace geom_col() with geom_chicket(). You’ll need to play with the segment width value depending on the size of your own, different plots to get the best look (just like you do with any other geom).

Astute, intrepid readers will note that the above chart has all the topics whereas the @nytimes just has a few. We can do the grouping of non-salient topics into an “Other” category with forcats::fct_other() and make a manual fill scale from the values stolen fromused in homage from the @nytimes:

debates2019 %>%
  mutate(speaker = fct_reorder(speaker, elapsed, sum, .desc=FALSE)) %>%
  mutate(topic = fct_other(
    topic,
    c("Immigration", "Economy", "Climate Change", "Gun Control", "Healthcare", "Foreign Policy"))
  ) %>%
  ggplot(aes(speaker, elapsed, group = timestamp, fill = topic)) +
  geom_chicklet(width = 0.75) +
  scale_y_continuous(
    expand = c(0, 0.0625),
    position = "right",
    breaks = seq(0, 14, 2),
    labels = c(0, sprintf("%d min.", seq(2, 14, 2)))
  ) +
  scale_fill_manual(
    name = NULL,
    values = c(
      "Immigration" = "#ae4544",
      "Economy" = "#d8cb98",
      "Climate Change" = "#a4ad6f",
      "Gun Control" = "#cc7c3a",
      "Healthcare" = "#436f82",
      "Foreign Policy" = "#7c5981",
      "Other" = "#cccccc"
    ),
    breaks = setdiff(unique(debates2019$topic), "Other")
  ) +
  guides(
    fill = guide_legend(nrow = 1)
  ) +
  coord_flip() +
  labs(
    x = NULL, y = NULL, fill = NULL,
    title = "How Long Each Candidate Spoke",
    subtitle = "Nights 1 & 2 of the June 2019 Democratic Debates",
    caption = "Each bar segment represents the length of a candidate’s response to a question.\n\nOriginals <https://www.nytimes.com/interactive/2019/admin/100000006581096.embedded.html?>\n<https://www.nytimes.com/interactive/2019/admin/100000006584572.embedded.html?>\nby @nytimes Weiyi Cai, Jason Kao, Jasmine C. Lee, Alicia Parlapiano and Jugal K. Patel\n\n#rstats reproduction by @hrbrmstr"
  ) +
  theme_ipsum_rc(grid="X") +
  theme(axis.text.x = element_text(color = "gray60", size = 10)) +
  theme(legend.position = "top")

final chicklet chart

FIN

Remember, you can find out how to install {ggchicklet} and also where you can file issues or PRs over at https://cinc.rud.is/web/packages/ggchicklet/. The package has full documentation, including a vignette, but if any usage help is lacking, definitely file an issue.

If you use the package, don’t hesitate to share your creations in a comment or on Twitter so other folks can see how to use the package in different contexts.

To leave a comment for the author, please follow the link and comment on their blog: R – rud.is.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)