plyr and reshape: better, faster, more productive

September 10, 2010
By

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

Hadley Wickham has just released updates to his data-manipulation packages for Rplyr and reshape (now called reshape2), that are much faster and more memory-efficient than the previous incarnations. The reshape2 package lets you flexibly restructure and aggregate data using just three functions (melt, acast and dcast), whereas the plyr package is kind of like a supercharged SQL "GROUP BY" statement for R data frames.

One of the most interesting aspects of this update is that plyr can now parallelize its operations and make use of multiple processors simultaneously to speed up really big data-munging jobs. It makes use of Revolution's contributed foreach package, so whatever platform you're on (Windows, Linux, or Mac) you can specify a suitable parallel backend and take advantage of significant speedups on multiprocessor machines.

For example, on a 2-core Windows box can use the doSMP package from Revolution R to speed up a plyr call as follows:

require(doSMP)
workers <- startWorkers(2) # My computer has 2 cores
registerDoSMP(workers)

llply(my_data, aggr_function, .parallel=TRUE)

On Unix-like platforms (including Linux and Mac) you can use the doMC package for similar ends. Find more information about plyr at Hadley's website, below.

Had.co.nz: plyr 

To leave a comment for the author, please follow the link and comment on his blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , ,

Comments are closed.