Which functions in plyr do people use?

November 2, 2012
By

This is the question that Hadley Wickham recently set out to discovering by asking frequent R and plyr users how they use it in an online survey.

Once a decent number of people have responded, Hadley quickly went forward and produced a short analysis of the plyr usage survey, and published it in RPubs.  With his permission, I am re-posting his analysis here:

 

Plyr usage survey results

Thanks to everyone who took part! I recieved 124 responses in about 24 hours, which is super awesome. This document gives a quick writeup of the results.

Function usage

Overall, function usage was much as I expected: ddply is by far the most commonly used function followed by ldply and dlply, then llply. This is reassuring because for the next iteration of plyr, I’m planning to focus on ddply, ldply and dlply.

plot of chunk unnamed-chunk-2

Other functions

I didn’t perform a formal analysis of the free text “other functions”, but common themes were:

  • parallelisation
  • progress bars
  • join
  • mutate, summarise, arrange
  • colwise
  • count
  • rbind.fill

Comments

Again, no formal analysis, but the common themes were:

  • You like plyr – thanks!
  • Make plyr faster – this is a big motivation for the next iteration, and initial explorations are promising: I should be able to get a 10-100x speedup for many cases.
  • Documentation and examples could be better – I know, but good documentation is hard!

A few things that you complained about that are fixed in the current dev version:

  • summarise now works sequentially (i.e. you can refer to columns you just created)
  • there’s a new progress bar (thanks to Mike Lawrence) that estimates the amount of time remaining
  • a new here function makes it easier to use ddply + summarise/mutate/subset inside a function

 



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: ,

Comments are closed.