Visualize large data sets with the bigvis package

April 8, 2013

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

Creating visualizations of large data sets is a tough problem: with a limited number of pixels available on the screen (or just with the limited visual acuity of the human eye), massive numbers of symbols on the page can easily result in an uninterpretable mess. On Friday we shared one way of tackling the problem using Revolution R Enterprise: hexagonal binning charts. Relatedly, RStudio's chief scientist Hadley Wickham is taking a comprehensive approach to big-data visualization with his new open-source bigviz package, currently available on GitHub. (Disclaimer: early development of this package was funded by Revolution Analytics.)

The basic idea of the package is to use aggregation and smoothing techniques on big data sets before plotting, to create visualizations that give meaningful insights into the data, and that can be computed quickly and rendered efficiently using R's standard graphics engine. Despite the large data sets involved, the visualization functions in the pacakge are fast, because the "bin-summarise-smooth" cycle is performed directly in C++, direcly on the R object stored in memory. The system is described in detail in this Infovis preprint, and includes this example chart using the famous airline data:

The bigvis package is available for installation now on GitHub (use the devtools package to make the install easier). You can find installation instructions and links to the documentation in the README file linked below.

GitHub (Hadley Wickham): bigvis package

To leave a comment for the author, please follow the link and comment on their blog: Revolutions. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Mango solutions

plotly webpage

dominolab webpage

Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training




CRC R books series

Six Sigma Online Training

Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)