How to combine point and boxplots in timeline charts with ggplot2 facets

In a recent project, I was looking to plot data from different variables along the same time axis. The difficulty was, that some of these variables I wanted to have as point plots, while others I wanted as box-plots.

Because I work with the tidyverse, I wanted to produce these plots with ggplot2. Faceting was the obvious first step but it took me quite a while to figure out how to best combine facets with point plots (where I have one value per time point) with and box-plots (where I have multiple values per time point).

The reason why this isn’t trivial is that box plots require groups or factors on the x-axis, while points can be plotted over a continuous range of x-values. If your alarm bells are ringing right now, you are absolutely right: before you try to combine plots with different x-axis properties, you should think long and hard whether this is an accurate representation of the data and if its a good idea to do so! Here, I had multiple values per time point for one variable and I wanted to make the median + variation explicitly clear, while also showing the continuous changes of other variables over the same range of time.

So, I am writing this short tutorial here in hopes that it saves the next person trying to do something similar from spending an entire morning on stackoverflow. 😉

For this demonstration, I am creating some fake data:

dates <- seq(as.POSIXct("2017-10-01 07:00"), as.POSIXct("2017-10-01 10:30"), by = 180) # 180 seconds == 3 minutes
fake_data <- data.frame(time = dates,
                        var1_1 = runif(length(dates)),
                        var1_2 = runif(length(dates)),
                        var1_3 = runif(length(dates)),
                        var2 = runif(length(dates))) %>%
  sample_frac(size = 0.33)
##                   time    var1_1    var1_2    var1_3       var2
## 8  2017-10-01 07:21:00 0.2359625 0.6121708 0.4114921 0.03327728
## 27 2017-10-01 08:18:00 0.5592436 0.3834683 0.8025474 0.44557932
## 29 2017-10-01 08:24:00 0.7667775 0.4636693 0.7642972 0.97718507
## 18 2017-10-01 07:51:00 0.2819686 0.3995273 0.9127757 0.42115579
## 1  2017-10-01 07:00:00 0.5940754 0.1599054 0.7287677 0.91953437
## 71 2017-10-01 10:30:00 0.2159290 0.2853349 0.7817291 0.57598897

Here, variable 1 (var1) has three measurements per time point, while variable 2 (var2) has one.

First, for plotting with ggplot2 we want our data in a tidy long format. I also add another column for faceting that groups the variables from var1 together.

fake_data_long <- fake_data %>%
  gather(x, y, var1_1:var2) %>%
  mutate(facet = ifelse(x %in% c("var1_1", "var1_2", "var1_3"), "var1", x))
##                  time      x         y facet
## 1 2017-10-01 07:21:00 var1_1 0.2359625  var1
## 2 2017-10-01 08:18:00 var1_1 0.5592436  var1
## 3 2017-10-01 08:24:00 var1_1 0.7667775  var1
## 4 2017-10-01 07:51:00 var1_1 0.2819686  var1
## 5 2017-10-01 07:00:00 var1_1 0.5940754  var1
## 6 2017-10-01 10:30:00 var1_1 0.2159290  var1

Now, we can plot this the following way:

fake_data_long %>%
  ggplot() +
    facet_grid(facet ~ ., scales = "free") +
    geom_point(data = subset(fake_data_long, facet == "var2"), 
               aes(x = time, y = y),
               size = 1) +
    geom_line(data = subset(fake_data_long, facet == "var2"), 
               aes(x = time, y = y)) +
    geom_boxplot(data = subset(fake_data_long, facet == "var1"), 
               aes(x = time, y = y, group = time))

