Applying functions on groups: sqldf, plyr, doBy, aggregate or data.table ?

Posted on March 17, 2011 by altuna in Uncategorized | 0 Comments

[This article was first published on Recipes, scripts and genomics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Which one of the sqldf, plyr, doBy and aggregate functions/packages would be faster for applying functions on groups of rows? I was wondering about this earlier in this post. It seems sqldf would be the fastest according to a post in manipulatr mail list.

Well, here is the ranking from the fastest to the slowest:

(check the link for the post above for details on comparison)

The downside of sqldf is that you will be able to use only sql functions on groups, but in the rest of the other options you can apply R functions on groups.

EDIT: I repeated the experiment on my own computer, this time I included data.table package as well. The new experiment shows that sqldf is still the fastest but data.table ranks second now. But I have been warned that the mean() function has large overhead and in fact if I use .Internal(mean(x)) instead of mean(x) for the group operations, data.table is the fastest!!! Scroll down to see another comparison of the grouping functions, this time using sum() function which doesn’t seem to have a large overhead.

The code is below. I mostly copy-pasted from the post at manipulatr, except the data.table part and the plot.

EDIT: More unbiased way to measure this is to use sum() function on groups. When I do that, data.table comes first!

To leave a comment for the author, please follow the link and comment on their blog: Recipes, scripts and genomics.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Applying functions on groups: sqldf, plyr, doBy, aggregate or data.table ?

Related

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)