Creating visualizations of large data sets is a tough problem: with a limited number of pixels available on the screen (or just with the limited visual acuity of the human eye), massive numbers of symbols on the page can easily result in an uninterpretable mess. On Friday we shared one way of tackling the problem using Revolution R Enterprise: hexagonal binning charts. Relatedly, RStudio's chief scientist Hadley Wickham is taking a comprehensive approach to big-data visualization with his new open-source bigviz package, currently available on GitHub. (Disclaimer: early development of this package was funded by Revolution Analytics.)
The basic idea of the package is to use aggregation and smoothing techniques on big data sets before plotting, to create visualizations that give meaningful insights into the data, and that can be computed quickly and rendered efficiently using R's standard graphics engine. Despite the large data sets involved, the visualization functions in the pacakge are fast, because the “bin-summarise-smooth” cycle is performed directly in C++, direcly on the R object stored in memory. The system is described in detail in this Infovis preprint, and includes this example chart using the famous airline data:
The bigvis package is available for installation now on GitHub (use the devtools package to make the install easier). You can find installation instructions and links to the documentation in the README file linked below.
GitHub (Hadley Wickham): bigvis package