Spinning Cycles in Box #4 To Take the Pies out of Pi Day

March 14, 2016
By

(This article was first published on R – rud.is, and kindly contributed to R-bloggers)

I caught this tweet today:

The WSJ folks usually do a great job, but this was either rushed or not completely thought through. There’s no way you’re going to be able to do any real comparisons between the segments across pies and direct pie % labels kinda mean they should have just made a table if they were going to phone it in.

Despite the fact that today is Pi[e] Day, these pies need to go.

If the intent was to primarily allow comparison of hours in-task, leaving some ability to compare the same time category across tasks, then bars are probably the way to go (you could do a parallel coordinates plot, but those looks like tangled guitar strings to me, so I’ll stick with bars). Here’s one possible alternative using R & ggplot2. Since I provide the data, please link to your own creations as I’d love to see how others would represent the data.

NOTE: I left direct bar labels off deliberately. My view is that (a) this is designed to be a relative comparison vs precise comparison & (b) it’s survey data and if we’re going to add #’s I’d feel compelled to communicate margin of error, etc. I don’t think that’s necessary.

library(ggplot2)
library(grid)
library(scales)
library(hrbrmisc) # devtools::install_github("hrbrmstr/hrbrmisc")
library(tidyr)
 
dat <- read.table(text=
"Task|less_than_one_hour_per_week|one_to_four_hours_per_week|one_to_three_hours_a_day|four_or_more_hours_a_day
Basic exploratory data analysis|11|32|46|12
Data cleaning|19|42|31|7
Machine learning, statistics|34|29|27|10
Creating visualizations|23|41|29|7
Presenting analysis|27|47|20|6
Extract, transform, load|43|32|20|5", sep="|", header=TRUE, stringsAsFactors=FALSE)
 
amount_trans <- c("one_to_four_hours_per_week"="1-4 hrs/nwk", 
                  "less_than_one_hour_per_week"="<1 hr/nwk", 
                  "one_to_three_hours_a_day"="1-3 hrs/nday", 
                  "four_or_more_hours_a_day"="4+ hrs/nday")
 
dat <- gather(dat, amount, value, -Task)
dat$value <- dat$value / 100
dat$amount <- factor(amount_trans[dat$amount], levels=amount_trans)
 
title_trans <- c("Basic exploratory data analysis"="Basic exploratoryndata analysis", 
                 "Data cleaning"="Datancleaning", 
                 "Machine learning, statistics"="Machine learning,nstatistics", 
                 "Creating visualizations"="Creatingnvisualizations", 
                 "Presenting analysis"="Presentingnanalysis", 
                 "Extract, transform, load"="Extract,ntransform, load")
 
dat$Task <-factor(title_trans[dat$Task], levels=title_trans)
 
gg <- ggplot(dat, aes(x=amount, y=value, fill=amount))
gg <- gg + geom_bar(stat="identity", width=0.75, color="#2b2b2b", size=0.05)
gg <- gg + scale_y_continuous(expand=c(0,0), labels=percent, limits=c(0, 0.5))
gg <- gg + scale_x_discrete(expand=c(0,1))
gg <- gg + scale_fill_manual(name="", values=c("#a6cdd9", "#d2e4ee", "#b7b079", "#efc750"))
gg <- gg + facet_wrap(~Task)
gg <- gg + labs(x=NULL, y=NULL, title="Where Does the Time Go?")
gg <- gg + theme_hrbrmstr(grid="Y", axis="x", plot_title_margin=9)
gg <- gg + theme(panel.background=element_rect(fill="#efefef", color=NA))
gg <- gg + theme(strip.background=element_rect(fill="#858585", color=NA))
gg <- gg + theme(strip.text=element_text(family="OpenSans-CondensedBold", size=12, color="white", hjust=0.5))
gg <- gg + theme(panel.margin.x=unit(1, "cm"))
gg <- gg + theme(panel.margin.y=unit(0.5, "cm"))
gg <- gg + theme(legend.position="none")
gg <- gg + theme(panel.grid.major.y=element_line(color="#b2b2b2"))
gg <- gg + theme(axis.text.x=element_text(margin=margin(t=-10)))
gg <- gg + theme(axis.text.y=element_text(margin=margin(r=-10)))
 
ggplot_with_subtitle(gg, 
                     "The amount of time spent on various tasks by surveyed non-managers in data-science positions.",
                     fontfamily="OpenSans-CondensedLight", fontsize=12, bottom_margin=16)

RStudioScreenSnapz017

To leave a comment for the author, please follow the link and comment on their blog: R – rud.is.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Mango solutions



RStudio homepage



Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training

datasociety

http://www.eoda.de





ODSC

ODSC

CRC R books series





Six Sigma Online Training









Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)