How to combine point and boxplots in timeline charts with ggplot2 facets

November 17, 2017
By

(This article was first published on Shirin's playgRound, and kindly contributed to R-bloggers)

In a recent project, I was looking to plot data from different variables along the same time axis. The difficulty was, that some of these variables I wanted to have as point plots, while others I wanted as box-plots.

Because I work with the tidyverse, I wanted to produce these plots with ggplot2. Faceting was the obvious first step but it took me quite a while to figure out how to best combine facets with point plots (where I have one value per time point) with and box-plots (where I have multiple values per time point).

The reason why this isn’t trivial is that box plots require groups or factors on the x-axis, while points can be plotted over a continuous range of x-values. If your alarm bells are ringing right now, you are absolutely right: before you try to combine plots with different x-axis properties, you should think long and hard whether this is an accurate representation of the data and if its a good idea to do so! Here, I had multiple values per time point for one variable and I wanted to make the median + variation explicitly clear, while also showing the continuous changes of other variables over the same range of time.

So, I am writing this short tutorial here in hopes that it saves the next person trying to do something similar from spending an entire morning on stackoverflow. 😉

For this demonstration, I am creating some fake data:

library(tidyverse)
dates <- seq(as.POSIXct("2017-10-01 07:00"), as.POSIXct("2017-10-01 10:30"), by = 180) # 180 seconds == 3 minutes
fake_data <- data.frame(time = dates,
                        var1_1 = runif(length(dates)),
                        var1_2 = runif(length(dates)),
                        var1_3 = runif(length(dates)),
                        var2 = runif(length(dates))) %>%
  sample_frac(size = 0.33)
head(fake_data)
##                   time    var1_1    var1_2    var1_3       var2
## 8  2017-10-01 07:21:00 0.2359625 0.6121708 0.4114921 0.03327728
## 27 2017-10-01 08:18:00 0.5592436 0.3834683 0.8025474 0.44557932
## 29 2017-10-01 08:24:00 0.7667775 0.4636693 0.7642972 0.97718507
## 18 2017-10-01 07:51:00 0.2819686 0.3995273 0.9127757 0.42115579
## 1  2017-10-01 07:00:00 0.5940754 0.1599054 0.7287677 0.91953437
## 71 2017-10-01 10:30:00 0.2159290 0.2853349 0.7817291 0.57598897

Here, variable 1 (var1) has three measurements per time point, while variable 2 (var2) has one.

First, for plotting with ggplot2 we want our data in a tidy long format. I also add another column for faceting that groups the variables from var1 together.

fake_data_long <- fake_data %>%
  gather(x, y, var1_1:var2) %>%
  mutate(facet = ifelse(x %in% c("var1_1", "var1_2", "var1_3"), "var1", x))
head(fake_data_long)
##                  time      x         y facet
## 1 2017-10-01 07:21:00 var1_1 0.2359625  var1
## 2 2017-10-01 08:18:00 var1_1 0.5592436  var1
## 3 2017-10-01 08:24:00 var1_1 0.7667775  var1
## 4 2017-10-01 07:51:00 var1_1 0.2819686  var1
## 5 2017-10-01 07:00:00 var1_1 0.5940754  var1
## 6 2017-10-01 10:30:00 var1_1 0.2159290  var1

Now, we can plot this the following way:

  • facet by variable
  • subset data to facets for point plots and give aesthetics in geom_point()
  • subset data to facets for box plots and give aesthetics in geom_boxplot(). Here we also need to set the group aesthetic; if we don’t specifically give that, we will get a plot with one big box, instead of a box for every time point.
fake_data_long %>%
  ggplot() +
    facet_grid(facet ~ ., scales = "free") +
    geom_point(data = subset(fake_data_long, facet == "var2"), 
               aes(x = time, y = y),
               size = 1) +
    geom_line(data = subset(fake_data_long, facet == "var2"), 
               aes(x = time, y = y)) +
    geom_boxplot(data = subset(fake_data_long, facet == "var1"), 
               aes(x = time, y = y, group = time))

sessionInfo()
## R version 3.4.2 (2017-09-28)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS High Sierra 10.13.1
## 
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] de_DE.UTF-8/de_DE.UTF-8/de_DE.UTF-8/C/de_DE.UTF-8/de_DE.UTF-8
## 
## attached base packages:
## [1] methods   stats     graphics  grDevices utils     datasets  base     
## 
## other attached packages:
##  [1] bindrcpp_0.2    forcats_0.2.0   stringr_1.2.0   dplyr_0.7.4    
##  [5] purrr_0.2.4     readr_1.1.1     tidyr_0.7.2     tibble_1.3.4   
##  [9] ggplot2_2.2.1   tidyverse_1.2.1
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_0.2.3 reshape2_1.4.2   haven_1.1.0      lattice_0.20-35 
##  [5] colorspace_1.3-2 htmltools_0.3.6  yaml_2.1.14      rlang_0.1.4     
##  [9] foreign_0.8-69   glue_1.2.0       modelr_0.1.1     readxl_1.0.0    
## [13] bindr_0.1        plyr_1.8.4       munsell_0.4.3    blogdown_0.3    
## [17] gtable_0.2.0     cellranger_1.1.0 rvest_0.3.2      psych_1.7.8     
## [21] evaluate_0.10.1  labeling_0.3     knitr_1.17       parallel_3.4.2  
## [25] broom_0.4.2      Rcpp_0.12.13     scales_0.5.0     backports_1.1.1 
## [29] jsonlite_1.5     mnormt_1.5-5     hms_0.3          digest_0.6.12   
## [33] stringi_1.1.5    bookdown_0.5     grid_3.4.2       rprojroot_1.2   
## [37] cli_1.0.0        tools_3.4.2      magrittr_1.5     lazyeval_0.2.1  
## [41] crayon_1.3.4     pkgconfig_2.0.1  xml2_1.1.1       lubridate_1.7.1 
## [45] assertthat_0.2.0 rmarkdown_1.7    httr_1.3.1       rstudioapi_0.7  
## [49] R6_2.2.2         nlme_3.1-131     compiler_3.4.2

To leave a comment for the author, please follow the link and comment on their blog: Shirin's playgRound.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)