**fishR » R**, and kindly contributed to R-bloggers)

Note that this is largely a repeat of a previous post (except that I

have added a few plots at the bottom) as I am experimenting with being

able to write posts here directly from R using the knit2wp() function

(in the knitR package) and the R markdown language.If successful this will allow my posts with R code to show the results

produced, which will make the posts more readable. [Currently I don’t

like that the results look like the source code, but I have not

figured out how to reliably fix that yet.]I apologize for cluttering your mailboxes. Let me know if you have

any comments or suggestions.

I came across a “problem” today where I needed to create catch data for individual nets from length measurements made on individual fish in those nets. In other words, I had data that showed three individual length measurements for Brook Trout, two measurements for Lake Trout, and two measurements for Rainbow Trout in net #1 and I needed a data frame that showed these catch amounts (i.e., the three, two, and two). Of course, the real problem had more fish and more nets.

The `ddply()`

function from the **plyr** package works very well for this type of problem as illustrated below. Basically, this function is used to break down your original data frame into smaller groups (in this case nets), apply some function to each group (in this case compute the length of the fish length variable which will correspond to the number of fish caught), and then combine the results from each grouping back to a resultant data frame. Hadley Wickham, the author of **plyr**, calls this the Split-Apply-Combine strategy.

First, let’s make some toy data for the example

lens <- data.frame(net=rep(c(1,2,3),c(7,5,6)), eff=rep(c(1,2,2),c(7,5,6)), temp=rep(c(17,15.5,16.5),c(7,5,6)), species=c(rep(c("BKT","LKT","RBT"),c(3,2,2)), rep(c("BKT","LKT"),c(2,3)), rep(c("BKT","RBT"),c(4,2))), tl=round(rnorm(18,mean=100,sd=10),0) ) lens

## net eff temp species tl ## 1 1 1 17.0 BKT 111 ## 2 1 1 17.0 BKT 108 ## 3 1 1 17.0 BKT 106 ## 4 1 1 17.0 LKT 107 ## 5 1 1 17.0 LKT 103 ## 6 1 1 17.0 RBT 105 ## 7 1 1 17.0 RBT 96 ## 8 2 2 15.5 BKT 96 ## 9 2 2 15.5 BKT 94 ## 10 2 2 15.5 LKT 94 ## 11 2 2 15.5 LKT 119 ## 12 2 2 15.5 LKT 97 ## 13 3 2 16.5 BKT 108 ## 14 3 2 16.5 BKT 91 ## 15 3 2 16.5 BKT 111 ## 16 3 2 16.5 BKT 109 ## 17 3 2 16.5 RBT 94 ## 18 3 2 16.5 RBT 102

Then let’s use `ddply()`

to turn this into catch data. In this case, `ddply()`

takes the original data frame as the first argument, a formula that consists of the variables used to make the groupings (more about this below) as the second argument, the `summarize()`

function (without the parentheses) as the third argument, and then the name of a new variable set equal to a function that computes a summary (`length()`

of the fish length variable in this case). In this case, the original data frame will be split into groups based on unique combinations of the *net* and *species* variables (note that the *eff*(ort) and *temp*(erature) variables are not unique from the net variable so they will be repeated with *net* in the final data frame).

library(plyr) catch1 <- ddply(lens,~net+eff+temp+species, summarize,catch=length(tl)) catch1

## net eff temp species catch ## 1 1 1 17.0 BKT 3 ## 2 1 1 17.0 LKT 2 ## 3 1 1 17.0 RBT 2 ## 4 2 2 15.5 BKT 2 ## 5 2 2 15.5 LKT 3 ## 6 3 2 16.5 BKT 4 ## 7 3 2 16.5 RBT 2

A common problem with this type of data is that mean catch per net will not be computed properly because some species were not captured in some nets, but no zero for those species is entered for those nets. The `addZeroCatch()`

function in the **FSA** package can be used to automatically (though, not quickly) enter these zeroes. This function requires the data frame with catches as the first argument, the name of the variable that identifies the net as the second argument, the name of the variable that identifies the species as the third argument, and a vector of names of variables that should be set to zero in the zerovar= argument. This process is illustrated below.

library(FSA) catch2 <- addZeroCatch(catch1,"net","species", zerovar="catch") catch2[order(catch2$net,catch2$species),]

## net eff temp species catch ## 1 1 1 17.0 BKT 3 ## 2 1 1 17.0 LKT 2 ## 3 1 1 17.0 RBT 2 ## 4 2 2 15.5 BKT 2 ## 5 2 2 15.5 LKT 3 ## 41 2 2 15.5 RBT 0 ## 6 3 2 16.5 BKT 4 ## 61 3 2 16.5 LKT 0 ## 7 3 2 16.5 RBT 2

Now, for example, the mean and SD of catch-per-unit-effort (CPE) per species can be computed.

catch2$cpe <- catch2$catch/catch2$eff ( cpesum <- ddply(catch2,~species, summarize,mean.cpe=mean(cpe),sd.cpe=sd(cpe)) )

## species mean.cpe sd.cpe ## 1 BKT 2.000 1.000 ## 2 LKT 1.167 1.041 ## 3 RBT 1.000 1.000

As an example, you can make a histogram of the lengths of Brook Trout in the original data frame

with(subset(lens,species=="BKT"), hist(tl,xlab="Total Length",main="",col="gray90"))

or a barplot of the mean CPE by species

with(cpesum, barplot(mean.cpe,names.arg=species, ylab="Mean CPE",xlab="Species"))

Obviously, this is a toy example, but it can be scaled up to larger projects.

Filed under: Fisheries Science, R Tagged: Data Manipulation, plyr, R

**leave a comment**for the author, please follow the link and comment on their blog:

**fishR » R**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...