GG Periodic Highlight

[This article was first published on koaning.io, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Sometimes you’ll to confirm if a timeseries pattern is influenced by the day of the week. Weekends are a prime example for when usually online behavior is different. This document will explain a method of communicating this visually.

library(ggplot2)
library(dplyr)

Let’s first generate some data which has a small negative bias towards the weekend.

n <- 750
df <- data.frame(datetime = as.POSIXct('2015-01-01') + (1:n)*3600, 
                 value = rnorm(n))

df <- df %>% 
  mutate(day_of_week = datetime %>% strftime('%A'),
         week_nr = datetime %>% strftime('%W')) %>% 
  mutate(value = ifelse(day_of_week %in% c('Saturday', 'Sunday'), 
         value - runif(n) * 3, value))

ggplot() + 
  geom_line(data=df, aes(datetime, value)) + 
  ggtitle('timeseries without weekends highlighted')

The bias is big enough to suggest some form of seasonality, though it may not immediately be obvious that it is for weekends. We could look up the dates and confirm that the time between the peaks are 7 days, but perferably this we want to commincate this visually.

Let’s instead create a dataframe that will be able to highlight correct dates.

y_min = (df$value %>% min) - 1
y_max = (df$value %>% max) + 1

df_highlight <- df %>% 
  filter(day_of_week %in% c('Saturday', 'Sunday')) %>% 
  group_by(week_nr) %>% 
  summarise(xmin = min(datetime), xmax = max(datetime))

ggplot() + 
  geom_rect(data=df_highlight, 
            aes(xmin = xmin, xmax = xmax, ymin = y_min, ymax = y_max), 
            alpha = 0.15) + 
  geom_line(data=df, aes(datetime, value)) + 
  ggtitle('timeseries with weekends highlighted')

Hours per day

Another pattern to consider is to look at certain hours during the day. The code will be similar. We’ll seperate the concern of highlighting the correct dates to another dataframe and another layer of the plot.

set.seed(1)
n <- 150
df <- data.frame(datetime = as.POSIXct('2015-01-01') + (1:n)*3600, 
                 value = rnorm(n))

df <- df %>% 
  mutate(hour = datetime %>% strftime('%H') %>% as.numeric,
         date = datetime %>% as.Date,
         value = ifelse(hour %in% 1:6, value - 1 - runif(n), value))

ggplot() + 
  geom_line(data=df, aes(datetime, value)) + 
  ggtitle('timeseries without early hours highlighted')

Note that from this series, it is visually not obvious that there is a pattern.

y_min = (df$value %>% min) - 0.1
y_max = (df$value %>% max) + 0.1

df_highlight <- df %>% 
  filter(hour %in% c(1,2,3,4,5,6)) %>% 
  group_by(date) %>% 
  summarise(xmin = min(datetime), xmax = max(datetime))

ggplot() + 
  geom_rect(data=df_highlight, 
            aes(xmin = xmin, xmax = xmax, ymin = y_min, ymax = y_max), 
            alpha = 0.15) + 
  geom_line(data=df, aes(datetime, value)) + 
  ggtitle('timeseries with early hours highlighted')

The pattern does become obvious when you apply the highlight.

Conclusion

This ‘obviousness’ should prompt a potential danger; visual bias. Even though this highlighting technique might be effective to point out a pattern when there is one it may also suggest a pattern when there isn’t. It remains a useful technique simply because from a domain perspective it is very sensible to visually confirm the effect of certain periods.

You should also be able to find this blog on: r-bloggers

To leave a comment for the author, please follow the link and comment on their blog: koaning.io.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)