Plotting “time of day” data using ggplot2
William asks:
How can I make a graph that looks like this, “tweet density” style, showing time intervals?
He then helpfully describes his input data: a CSV file with headers “time started, time finished, date”.
Here’s a simple CSV file, tasks.csv:
task,date,start,end task1,2010-03-05,09:00:00,13:00:00 task2,2010-03-06,10:00:00,15:00:00 task3,2010-03-06,11:00:00,18:00:00 task4,2010-03-07,08:00:00,11:00:00 task5,2010-03-08,14:00:00,17:00:00 task6,2010-03-09,12:00:00,16:00:00 task7,2010-03-10,14:00:00,19:00:00 task8,2010-03-11,09:30:00,13:30:00
Read into R, calculate the weekday and reorder the weekday factors from Sunday, Monday…to Saturday:
tasks <- read.csv("tasks.csv", header = T) # day of week tasks$day <- weekdays(strptime(tasks$date, "%Y-%m-%d")) week <- c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday") tasks$day <- factor(tasks$day, levels = week)
Convert the start and end times to decimal hours. I’m not very familiar with the as.POSIX… functions, so I’m sure that there’s a more elegant way to do this:
# convert time to decimal hours tasks$start.ct <- as.POSIXct(paste(tasks$date, tasks$start, sep = " ")) tasks$end.ct <- as.POSIXct(paste(tasks$date, tasks$end, sep = " ")) tasks$start.hour <- as.POSIXlt(tasks$start.ct)$hour + as.POSIXlt(tasks$start.ct)$min/60 + as.POSIXlt(tasks$start.ct)$sec/3600 tasks$end.hour <- as.POSIXlt(tasks$end.ct)$hour + as.POSIXlt(tasks$end.ct)$min/60 + as.POSIXlt(tasks$end.ct)$sec/3600
We’re going to plot task duration as a horizontal rectangle. If there is more than one task per day, we need to offset the rectangle vertically, so as they don’t overlap.
# offset tasks if > 1 per day tasks$ymin <- c(rep(0, nrow(tasks))) t <- table(tasks$day) for(day in rownames(t)) { if(t[[day]] > 1) { ss <- tasks[tasks$day == day,] y <- 0 for(i in as.numeric(rownames(ss))) { tasks[i,]$ymin <- y y <- y + 1.2 } } }
Finally, call ggplot with the rectangle geom plus a bunch of options to colour the rectangles (by task), facet the plot (by day) and clean up, rescale and label the axes:
# plot library(ggplot2) png(filename = "tasks.png", width = 640, height = 480) ggplot(tasks, aes(xmin = start.hour, xmax = end.hour, ymin = ymin, ymax = ymin + 1, fill = factor(task))) + geom_rect() + facet_grid(day~.) + opts(axis.text.y = theme_blank(), axis.ticks = theme_blank()) + xlim(0,23) + xlab("time of day")
Filed under: programming, R, statistics Tagged: datetime, how to, plotting
