Recreating (more) data visualizations from the book “Knowledge is Beautiful”: Part III

July 5, 2018
By

(This article was first published on Stories by Matt.0 on Medium, and kindly contributed to R-bloggers)

Welcome to the third installment of the series where I recreate data visualizations, in R, from the book Knowledge is Beautiful by David McCandless.

Here are the links for part I and part II of the series if you haven’t checked them out yet.

The list of frustrations in data science are many, for example:

Consider point 4 above. Even when the data is made available, it can still be distributed in frustrating formats. In part I of the series I showed how to access a specific excel sheet with readxlpackage . In part II I showed how to parse PDF tables in R with the tabulizer package. It might be of interest to some that, Luis D. Verde made a great post recently on how to deal with frustrating cell formatting in Excel in R.

Although McCandless made all the data public it took a bit of cleaning on my part before producing the visualizations. I decided that since this series is focused on data visualization I would leave the data munging code out for the rest of the series and instead provided .csv files of tidy data ready for ggplot.

Live Long

The live long visualization is a diverging bar chart depicting how certain actions affect your life span.

library(dplyr)
library(ggplot2)
# load the data
livelong <- read.csv("livelong.csv")
# Order by action
livelong$action <- factor(livelong$action, levels = c("Sleep too much", "Be optimistic", "Get promoted", "Live in a city", "Live in the country", "Eat less food", "Hang out with women - a lot!", "Drink a little alcohol", "Be conscientious", "Have more orgasms", "And a little red wine", "With close friends", "Be polygamous, maybe", "Go to church regularly", "Sit down", "More pets", "Eat red meat", "Avoid cancer", "Avoid heart disease", "Be alcoholic", "Get health checks", "Get married!", "Be rich", "Be a woman", "Suffer severe mental illness", "Become obese", "Keep smoking", "Live healthily", "Exercise more", "Live at high altitude"))
# Set legend title
legend_title <- "Strength of science"
# Make plot
p <- ggplot(livelong, aes(x = action, y = years, fill=strength)) +
geom_bar(stat = "identity") +
scale_fill_manual(legend_title, values = c("#8BC7AC","#D99E50","#CDAD35")) +
labs(title = "Live Long...", subtitle = "What will really extend your life?", caption = "Source: bit.ly/KIB_LiveLong") +
scale_y_continuous(position = "bottom") +
scale_x_discrete(limits = rev(factor(livelong$action))) +
#scale_x_reverse() +
coord_flip() +
theme(legend.position = "top",
panel.background = element_blank(),
plot.title = element_text(size = 13,
family = "Georgia",
face = "bold", lineheight = 1.2), plot.subtitle = element_text(size = 10,
family = "Georgia"),
plot.caption = element_text(size = 5,
hjust = 0.99, family = "Georgia"),
axis.text = element_text(family = "Georgia"),
# Get rid of the y- and x-axis titles
axis.title.y=element_blank(),
axis.title.x=element_blank(),
# Get rid of axis text
axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
legend.text = element_text(size = 8, family = "Georgia"),
legend.title = element_text(size = 8, family = "Georgia"),
legend.key.size = unit(1,"line"))

Okay, here is the first attempt at annotating with geom_text()

p + geom_text(aes(label = action), size = 3, family = "Georgia")

One instantly notices that the text annotation is misaligned. Furthermore, since the bar chart is diverging at the center (zero) and modifications to the hjust parameter won’t solve the problem on it’s own.

One possible work-around is to use the ggfittext package which constrains text inside a defined area with the geom_fit_text() function that works more or less like ggplot2::geom_text().

# currently only supported by the dev version
devtools::install_github("wilkox/ggfittext")
library(ggfittext)
p + geom_fit_text(aes(label = action), position = "stack", family = "Georgia")

We see that small bars were not annotated because the character strings are simply too big to be displayed in the bars.

The original visualization never constrains the text to the bars so the best approach is to add a variable to the table that will allow you to left-justify some labels and right-justify others.

# Set postive as "Up" and negative numbers as "Down"
livelong$direction <- ifelse(livelong$years > 0, "Up", "Down")
livelong$just <- ifelse(livelong$direction=="Down",0,1)
p + geom_text(aes(label = action), size = 3, family = "Georgia", hjust=livelong$just)

This justifies the boxes so that actions which decrease your lifespan are left-adjust and those which extend your life are right-adjusted. The only problem is that in the original visualization the text lines up on the center of the chart. I couldn’t figure out how to do that so bonus points if you can figure that out and post it to the comments!

A visually appealing alternative is to have the names outside of the bars.

livelong$just <- ifelse(livelong$direction=="Up",0,1)
p + geom_text(aes(label = action), size = 3, family = "Georgia", hjust=livelong$just)

Counting the Cause UK

The Counting the Cause UK dataset shows what charities UK citizens donate most to.

We can create a comparable visualization using the Treemap library. It creates a hierarchical display of nested rectangles, which is then tiled within smaller rectangles representing sub-branches.

library(treemap)
my_data <- read.csv("treemap.csv")
tm <- treemap(my_data, index = c("main","second", "third"), vSize = "percent", vColor = "percent", type = "value", title = "Counting the Cause UK")


Recreating (more) data visualizations from the book “Knowledge is Beautiful”: Part III was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

To leave a comment for the author, please follow the link and comment on their blog: Stories by Matt.0 on Medium.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)