Introducing FlyVis

December 15, 2013

(This article was first published on Frank Portman, and kindly contributed to R-bloggers)

After a couple all-nighters we’re finally done with our undergraduate statistics thesis. The abstract provides a brief overview of what we were trying to accomplish:

We explore the possibility of improving data analysis through the use of interactive visualization.
Exploration of data and models is an iterative process. We hypothesize that dynamic, interactive visualizations
greatly reduce the cost of new iterations and thus f acilitate agile investigation and rapid prototyping. Our
web-application framework,, offers evidence for such a hypothesis for a dataset consisting of airline
on-tim e flight performance between 2006-2008. Utilizing our framework we are able to study the feasibility of modeling
subsets of flight delays from temporal data, which fails on the full dataset.

Technically, this was a very fun project. Shiny is an extremely powerful package which provides the interactive framework necessary to build such applications. We also made use of the JavaScript library leaflet.js for the interactive map. All in all, I learned quite a bit about writing efficient R code, as the dataset we were using had over 18 million observations.

To learn more about the app check out the projects page or the actual application website

FlyVis lets you dynamically explore the airports on-time dataset which yields some pretty interesting graphs. For example, if we look at the intraday distribution of flights and delays for Memphis:

we see a pretty interesting pattern. Turns out the FedEx shipments control most of the flights out of Memphis which gives us this unique shape.

The beauty of an interactive application is that you (the reader) can discover something that I haven’t even considered. I merely provide the tools and you can explore. If anybody finds some cool patterns in certain airports then I’d love to hear about it over e-mail or comment.

Once we polish the application a bit more we will release the source code on GitHub. Disclaimer: The site will initially take a minute or two to load since our server has to load the massive dataset into memory. Also, the plots do take 5-6 seconds to generate. Again this is due to the size of our data and is something we are currently trying to optimize.

Finally, what would this post be without some R code? Here’s what we used for the Calendar Heatmap plot:

myCalPlot <- function (dates, values)
  if (!require(ggplot2) & !require(plyr))
    stop("The packages ggplot2 and plyr are required to use plotCalendarHeatmap")
  tp = projectDate(dates, drop = F)
  tp$values = values
  tp$week = as.numeric(format(dates, "%W"))
  tp$month <- factor(tp$month,levels=as.character(1:12),

  tp = ddply(tp, .(year, month), transform, monthweek = 1 +
               week - min(week))
  ggplot(tp, aes(monthweek, weekday, fill = values)) + geom_tile(colour = "white") +
    facet_grid(. ~ month) +
    scale_fill_gradientn(colours = rev(c("#D61818", "#FFAE63", "#FFFFBD", "#B5E384"))) + theme_bw()

Doesn’t include the pre-processing but that’ll be out early 2014.

To leave a comment for the author, please follow the link and comment on their blog: Frank Portman. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)