Using data.table for binning

(This article was first published on Omnia sunt Communia! » R-english, and kindly contributed to R-bloggers)

I discovered the impressive data.table package more than a year ago. In order to learn how to use it, I try to find a solution to some questions I read at mailing lists or at stackoverflow. My last experiment has been inspired by the bigvis package and its associated paper. This package is a proposal for exploratory data analysis of large datasets following the workflow of binning, summarizing and display.
Please note that I am neither trying to mimic the behavior of bigvis nor comparing both packages. It is only an excuse to learn more about data.table and this post shows the code I have used.
The first part of my experiment deals with one-dimensional data:

dt1d
The second part is more sophisticated. It uses the movie dataset to show how to carry out 2D binning.

dt2d

Some key points I have learned about data.table:

  • := to add, remove or modify by reference (avoids memory overhead since it does not make additional copies)
  • .N and .SD symbols for grouping

Still learning!


To leave a comment for the author, please follow the link and comment on his blog: Omnia sunt Communia! » R-english.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.