Sometimes, you just need to use a plyr

Posted on July 10, 2009 by jebyrnes in R bloggers | 0 Comments

[This article was first published on i'm a chordata! urochordata! » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I haven’t posted anything about R-nerdery in quite some time. But I have to pause for a moment, and sing the praises of a relatively new package that has made my life exponentially easier. The plyr package.

R has the capability to apply a single function to a vector or list using apply or mapply, or their various derivatives. This returns another vector or list.

This is great in principal, but in practice, with indexing, odd return objects, and difficulties in using multiple arguments, it can get out of hand for complex functions. Hence, one often resorts to a for loop.

Let me give you an example. Let’s say I have some data from a simple experiment where I wanted to look at the effect of adding urchins, lobsters, or both on a diverse community of sessile invertebrates – mussels, squirts, etc. Heck, let’s say, I had one gazillion species whose responses I was interested in. Now let’s say I want a simple summary table of the change in the density each species – and my data has before and after values. So, my data would look like this.

Urchins	Lobsters	Before.After	Replicate	Species	Density
Yes	No	Before	1	1	30
Yes	No	After	1	1	10
Yes	No	Before	1	2	45
Yes	No	After	1	2	23
. . . . .
Yes	Yes	Before	15	Gazillion	10
Yes	Yes	After	15	Gazillion	9

So, each row represents the measurement for one species in one replicate either before or after the experiment. Now, previously, to get an output table with the mean change for each species for each treatment, I would have had to do something like this:

#first, create blank list of vectors waiting to be filled
#as well as blank vectors to remember what treatment 
#combinations we're looking at
mean.list<-list()
urchin.vec<-vector()
lobster.vec<-vector()
for(i in 1:gazillion) mean.list[[paste("sp",i,sep=".")]]==vector()
 
#then, a for loop for each combination
for (i in levels(my.data$Urchins)){
  for (j in levels(my.data$Lobsters)){
    urchin.vec<-c(urchin.vec,i)
    lobster.vec<-c(lobster.vec,j)
    #get the right subset of the data
    sub.data<-subset(my.data, my.data$Urchins==i & my.data$Lobster==j)
 
    #now loop over all species
    for (k in 1:gazillion){ 
    sub.data<-subset(sub.data, sub.data$Species==k)
    before.data<-subset(sub.data, sub.data$Before.After=="Before")
    after.data<-subset(sub.data, sub.data$Before.After=="After")
    mean.list[[paste("sp",i-3,sep=".")]]<-
	c(mean.list[[paste("sp",i-3,sep=".")]], mean(after.data[k]-before.data[k])
 
    }
  }
}
 
#then turn it into a data frame to match back up to the right values
mean.data<-as.data.frame(mean.list)
mean.data$urchins<-as.factor(urchin.vec)
mean.data$lobsters<-as.factor(lobster.vec)

Ok, did that exhaust you as much it did me? Now, here’s how to get the same result using ddply (more on why it’s called ddply in a moment)

#a function where, if you give it a data frame with species 1 - gazillion
#will give you the mean for each species.  Note, the which statement
#lets me look at before and after data separately
#Also note the Change= in the return value.  That gives a new column name
#for multiple return columns, just return a vector, with each 
#new column getting it's own name
 
tab.means<-function(df){ return(Change = 
			df$Density[which($Before.After=="After")]
			- df$Density[which($Before.After=="After")]) }
 
#now use ddply
mean.data<-ddply(df, c("Urchins", "Lobsters", "Species"), tab.means))
 
#ok, in fairness, each species doesn't have it's own column.  But, we can 
#just use reshape
mean.data<-reshape(mean.data, timevar="Species", idvar=c("Urchins", "Lobsters", direction="wide)

Um. Whoah. Notice a small difference there. Now, perhaps the earlier code could have been cleaned up a bit with reshape, but still, 6 lines of code versus 18 – and most of that 18 was bookkeeping code, not something doing real useful computation.

Soooo…. Check out plyr! It will make your life a happier, more relaxed place. Enjoy!

(p.s. I’m noticing wordpress is ignoring all of my code indenting – anyone have any quick fixes for that?)

To leave a comment for the author, please follow the link and comment on their blog: i'm a chordata! urochordata! » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Sometimes, you just need to use a plyr

Related

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)