Plotly for R – Multi-Layer Plots

[This article was first published on Rstats on pi: predict/infer, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.




If you are new to plotly, consider first reading our introductory post:
Introduction to Interactive Graphics in R with plotly
 

Often when analyzing data, it is necessary to produce a complex plot that requires multiple graphical layers. In plotly, multi-layer plots can be specified as a pipeline of data manipulations (dplyr only) and visual mappings. This is possible because dplyr verbs can be used on a plotly object to modify the underlying data. In programming, mutability refers to the ability of an object to be modified after its creation. The mutability of plotly objects allows for a pipeline where you can add a graphical layer based on one version of the data, modify the data with dplyr, and then add a second layer based on the modified data. This design choice provides great flexibility in developing complex plots while still remaining intuitive. The resulting code is easy to read and understand, and it fits perfectly into a tidyverse workflow.

Mutability

To demonstrate the ability to manipulate the underlying data of a plotly object, we’ll use a simple example using the mpg dataset.

library(tidyverse)
library(plotly)

mpg_plotly <- mpg %>%
  plot_ly()

plot_ly() maps the R objects we pass into it into a JavaScript plotly object.

In a simple case we can then pass the plotly object into an add_*() function to specify how we’d like the data to be mapped to a graphical layer.

mpg_plotly %>%
  add_markers(x = ~cty, y = ~hwy)

As opposed to other plot objects (from base, ggplot2, etc), plotly objects are mutable. The data underlying the object can be manipulated using dplyr commands. A useful function to inspect the current data of the object is plotly_data().

mpg

## # A tibble: 234 x 11
##    manufacturer model displ  year   cyl trans drv     cty   hwy fl    class
##    <chr>        <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
##  1 audi         a4      1.8  1999     4 auto~ f        18    29 p     comp~
##  2 audi         a4      1.8  1999     4 manu~ f        21    29 p     comp~
##  3 audi         a4      2    2008     4 manu~ f        20    31 p     comp~
##  4 audi         a4      2    2008     4 auto~ f        21    30 p     comp~
##  5 audi         a4      2.8  1999     6 auto~ f        16    26 p     comp~
##  6 audi         a4      2.8  1999     6 manu~ f        18    26 p     comp~
##  7 audi         a4      3.1  2008     6 auto~ f        18    27 p     comp~
##  8 audi         a4 q~   1.8  1999     4 manu~ 4        18    26 p     comp~
##  9 audi         a4 q~   1.8  1999     4 auto~ 4        16    25 p     comp~
## 10 audi         a4 q~   2    2008     4 manu~ 4        20    28 p     comp~
## # ... with 224 more rows

mpg_plotly %>%
  plotly_data()

## # A tibble: 234 x 11
##    manufacturer model displ  year   cyl trans drv     cty   hwy fl    class
##    <chr>        <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
##  1 audi         a4      1.8  1999     4 auto~ f        18    29 p     comp~
##  2 audi         a4      1.8  1999     4 manu~ f        21    29 p     comp~
##  3 audi         a4      2    2008     4 manu~ f        20    31 p     comp~
##  4 audi         a4      2    2008     4 auto~ f        21    30 p     comp~
##  5 audi         a4      2.8  1999     6 auto~ f        16    26 p     comp~
##  6 audi         a4      2.8  1999     6 manu~ f        18    26 p     comp~
##  7 audi         a4      3.1  2008     6 auto~ f        18    27 p     comp~
##  8 audi         a4 q~   1.8  1999     4 manu~ 4        18    26 p     comp~
##  9 audi         a4 q~   1.8  1999     4 auto~ 4        16    25 p     comp~
## 10 audi         a4 q~   2    2008     4 manu~ 4        20    28 p     comp~
## # ... with 224 more rows

Since we haven’t manipulated the object in any way, plotly_data() returns the data that we passed in.

Let’s say that we only want to plot the miles-per-gallon data for pickup trucks.

pickup_plotly <- mpg_plotly %>%
  filter(class == "pickup") %>%
  add_markers(x = ~cty, y = ~hwy)

pickup_plotly


pickup_plotly %>%
  plotly_data()

## # A tibble: 33 x 11
##    manufacturer model displ  year   cyl trans drv     cty   hwy fl    class
##    <chr>        <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
##  1 dodge        dako~   3.7  2008     6 manu~ 4        15    19 r     pick~
##  2 dodge        dako~   3.7  2008     6 auto~ 4        14    18 r     pick~
##  3 dodge        dako~   3.9  1999     6 auto~ 4        13    17 r     pick~
##  4 dodge        dako~   3.9  1999     6 manu~ 4        14    17 r     pick~
##  5 dodge        dako~   4.7  2008     8 auto~ 4        14    19 r     pick~
##  6 dodge        dako~   4.7  2008     8 auto~ 4        14    19 r     pick~
##  7 dodge        dako~   4.7  2008     8 auto~ 4         9    12 e     pick~
##  8 dodge        dako~   5.2  1999     8 manu~ 4        11    17 r     pick~
##  9 dodge        dako~   5.2  1999     8 auto~ 4        11    15 r     pick~
## 10 dodge        ram ~   4.7  2008     8 manu~ 4        12    16 r     pick~
## # ... with 23 more rows

This equivalent plotly object can also be obtained by filtering the data prior to passing it into plot_ly(). However, the ability to modify the object will prove to be useful when creating more complex multi-layer plots.

plotly_pickup_1 <- mpg %>%
  filter(class == "pickup") %>%
  plot_ly()

plotly_pickup_2 <- mpg %>%
  plot_ly() %>%
  filter(class == "pickup")

plotly_pickup_1 %>%
  plotly_data()

## # A tibble: 33 x 11
##    manufacturer model displ  year   cyl trans drv     cty   hwy fl    class
##    <chr>        <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
##  1 dodge        dako~   3.7  2008     6 manu~ 4        15    19 r     pick~
##  2 dodge        dako~   3.7  2008     6 auto~ 4        14    18 r     pick~
##  3 dodge        dako~   3.9  1999     6 auto~ 4        13    17 r     pick~
##  4 dodge        dako~   3.9  1999     6 manu~ 4        14    17 r     pick~
##  5 dodge        dako~   4.7  2008     8 auto~ 4        14    19 r     pick~
##  6 dodge        dako~   4.7  2008     8 auto~ 4        14    19 r     pick~
##  7 dodge        dako~   4.7  2008     8 auto~ 4         9    12 e     pick~
##  8 dodge        dako~   5.2  1999     8 manu~ 4        11    17 r     pick~
##  9 dodge        dako~   5.2  1999     8 auto~ 4        11    15 r     pick~
## 10 dodge        ram ~   4.7  2008     8 manu~ 4        12    16 r     pick~
## # ... with 23 more rows

plotly_pickup_2 %>%
  plotly_data()

## # A tibble: 33 x 11
##    manufacturer model displ  year   cyl trans drv     cty   hwy fl    class
##    <chr>        <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
##  1 dodge        dako~   3.7  2008     6 manu~ 4        15    19 r     pick~
##  2 dodge        dako~   3.7  2008     6 auto~ 4        14    18 r     pick~
##  3 dodge        dako~   3.9  1999     6 auto~ 4        13    17 r     pick~
##  4 dodge        dako~   3.9  1999     6 manu~ 4        14    17 r     pick~
##  5 dodge        dako~   4.7  2008     8 auto~ 4        14    19 r     pick~
##  6 dodge        dako~   4.7  2008     8 auto~ 4        14    19 r     pick~
##  7 dodge        dako~   4.7  2008     8 auto~ 4         9    12 e     pick~
##  8 dodge        dako~   5.2  1999     8 manu~ 4        11    17 r     pick~
##  9 dodge        dako~   5.2  1999     8 auto~ 4        11    15 r     pick~
## 10 dodge        ram ~   4.7  2008     8 manu~ 4        12    16 r     pick~
## # ... with 23 more rows

Multi-layer Example

Now that we’ve set the foundation, we can look at a more complicated example.

We’ll be using the txhousing dataset from ggplot2, which tracks housing prices for cities in Texas over time. Let’s start by plotting the time trend for each city.

txhousing

## # A tibble: 8,602 x 9
##    city     year month sales   volume median listings inventory  date
##    <chr>   <int> <int> <dbl>    <dbl>  <dbl>    <dbl>     <dbl> <dbl>
##  1 Abilene  2000     1    72  5380000  71400      701       6.3 2000 
##  2 Abilene  2000     2    98  6505000  58700      746       6.6 2000.
##  3 Abilene  2000     3   130  9285000  58100      784       6.8 2000.
##  4 Abilene  2000     4    98  9730000  68600      785       6.9 2000.
##  5 Abilene  2000     5   141 10590000  67300      794       6.8 2000.
##  6 Abilene  2000     6   156 13910000  66900      780       6.6 2000.
##  7 Abilene  2000     7   152 12635000  73500      742       6.2 2000.
##  8 Abilene  2000     8   131 10710000  75000      765       6.4 2001.
##  9 Abilene  2000     9   104  7615000  64500      771       6.5 2001.
## 10 Abilene  2000    10   101  7040000  59300      764       6.6 2001.
## # ... with 8,592 more rows

all_cities <- txhousing %>%
  group_by(city) %>%
  plot_ly(x = ~date, y = ~median) %>%
  add_lines(
    name = "Texan Cities", 
    line = list(width = 1.33), 
    alpha = 0.2, 
    hoverinfo = "none"
  ) %>%
  ungroup()

all_cities

To leave a comment for the author, please follow the link and comment on their blog: Rstats on pi: predict/infer.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)