The plotly package. A godsend for interactive documents, dashboard and presentations. For such documents there is no doubt that anyone would prefer a plot created in plotly rather than ggplot2. Why? Using plotly gives you neat and crucially interactive options at the top, where as ggplot2 objects are static. In an app we have been developing here at Jumping Rivers, we found ourselves asking the question would it be quicker to use
plot_ly() or wrapping a ggplot2 object in
ggplotly()? I found the results staggering.
Throughout we will be using the packages: dplyr, tidyr, ggplot2, plotly and microbenchmark. The data in use is the
birthdays dataset in the mosaicData package. This data sets contains the daily birth count in each state of the USA from 1969 – 1988. The packages can be installed in the usual way (remember you can install packages in parallel)
install.packages(c("mosaicData", "dplyr", "tidyr", "ggplot2", "plotly", "microbenchmark"))
library("mosaicData") library("dplyr") library("tidyr") library("ggplot2") library("plotly") library("microbenchmark")
Let’s load and take a look at the data.
data("Birthdays", package = "mosaicData") head(Birthdays) ## state year month day date wday births ## 1 AK 1969 1 1 1969-01-01 Wed 14 ## 2 AL 1969 1 1 1969-01-01 Wed 174 ## 3 AR 1969 1 1 1969-01-01 Wed 78 ## 4 AZ 1969 1 1 1969-01-01 Wed 84 ## 5 CA 1969 1 1 1969-01-01 Wed 824 ## 6 CO 1969 1 1 1969-01-01 Wed 100
First, we’ll create a very simple scatter graph of the mean births in every year.
meanb = Birthdays %>% group_by(year) %>% summarise(mean = mean(births))
Wrapping this as a ggplot object inside
ggplotly() we obtain this…
ggplotly(ggplot(meanb) + geom_point(aes(y = mean, x = year, colour = year)))
plot_ly() give us this…
plot_ly(data = meanb, y = ~mean, x = ~year, color = ~year, type = "scatter")
Both graphs are, identical, bar styling, yes?
Now let’s use
microbenchmark to see how their timings compare (for an overview on timing R functions, see our previous blog post).
time = microbenchmark::microbenchmark( ggplotly = ggplotly(ggplot(meanb) + geom_point(aes(y = mean, x = year, colour = year))), plotly = plot_ly(data = meanb, y = ~mean, x = ~year, color = ~year, type = "scatter"), times = 100, unit = "s") time ## Unit: seconds ## expr min lq mean median uq max neval cld ## ggplotly 0.050139 0.052229 0.070750 0.054760 0.056785 1.56652 100 b ## plotly 0.002475 0.002527 0.003017 0.002571 0.002674 0.03061 100 a
Now I thought nesting a ggplot object within
ggplotly() would be slower than using
plot_ly(), but I didn’t think it would be this slow. On average
ggplotly() is approximately 23 times slower than
Let’s take it up a notch. There we were plotting only 20 points, what about if we plot over 20,000? Here we will plot the min, mean and max births on each day.
date = Birthdays %>% group_by(date) %>% summarise(mean = mean(births), min = min(births), max = max(births)) %>% gather(birth_stat, value, -date)
Wrapping this a ggplot2 object inside
ggplotly() we obtain this graph…
ggplotly(ggplot(date) + geom_point(aes(y = value, x = date, colour = birth_stat)))