plyr and reshape: better, faster, more productive

September 10, 2010

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

Hadley Wickham has just released updates to his data-manipulation packages for Rplyr and reshape (now called reshape2), that are much faster and more memory-efficient than the previous incarnations. The reshape2 package lets you flexibly restructure and aggregate data using just three functions (melt, acast and dcast), whereas the plyr package is kind of like a supercharged SQL "GROUP BY" statement for R data frames.

One of the most interesting aspects of this update is that plyr can now parallelize its operations and make use of multiple processors simultaneously to speed up really big data-munging jobs. It makes use of Revolution's contributed foreach package, so whatever platform you're on (Windows, Linux, or Mac) you can specify a suitable parallel backend and take advantage of significant speedups on multiprocessor machines.

For example, on a 2-core Windows box can use the doSMP package from Revolution R to speed up a plyr call as follows:

workers <- startWorkers(2) # My computer has 2 cores

llply(my_data, aggr_function, .parallel=TRUE)

On Unix-like platforms (including Linux and Mac) you can use the doMC package for similar ends. Find more information about plyr at Hadley's website, below. plyr 

To leave a comment for the author, please follow the link and comment on their blog: Revolutions. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , ,

Comments are closed.


Mango solutions

RStudio homepage

Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training


CRC R books series

Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)