Packages for By-Group Processing in R

February 24, 2011
By

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

Analyst and BI expert Steve Miller takes a look at the facilities in R for doing "by-group" processing of data. The task consisted of:

... read several text files, merge the results, reshape the intermediate data, calculate some new variables, take care of missing values, attend to meta data, execute a few predictive models and graph the results.

Then repeat the models and graphs for groups or sub-populations marked by distinct values of one or more dimension variables of interest.

The latter step is commonly referred to as “by-group processing.” SAS programmers will recognize by group processing with syntax that invokes a procedure on a sorted data set that looks something like:

proc reg data = dblahblah; by vblahblah;

Check out Steve's post for how he addressed this in R using the high-performance data.table package by Matthew Dowle (and as Steve suggests, a good place to get started is the example vignettes). 

I'd also add a recommendation for the plyr package which also offers tools to split up data sets by various criteria, and then do by-processing. Here, the plyr: divide and conquer guide is a good place to start. As an added bonus, you can also divide and conquer the computations by exploiting multiple nodes in parallel by engaging a parallel backend for the foreach function. (Note for Windows users: the doSMP backend from Revolution R is also available now on R-Forge and will be on CRAN soon, too.)

Information Management: By-Group Processing, the R data.table and the Power of Open Source

 

To leave a comment for the author, please follow the link and comment on his blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: ,

Comments are closed.