Creating Catch Data from Individual Length Measurements

June 6, 2013

(This article was first published on fishR » R, and kindly contributed to R-bloggers)

This example has been updated in this post.

I came across a “problem” today where I needed to create catch data for individual nets from length measurements made on individual fish in those nets.  In other words, I had data that showed three individual length measurements for Brook Trout, two measurements for Lake Trout, and two measurements for Rainbow Trout in net #1 and I needed a data frame that showed these catch amounts (i.e., the three, two, and two).  Of course, the real problem had more fish and more nets.

The ddply() function from the plyr package works very well for this type of problem as illustrated below.  Basically, this function is used to break down your original data frame into smaller groups (in this case nets), apply some function to each group (in this case compute the length of the fish length variable which will correspond to the number of fish caught), and then combine the results from each grouping back to a resultant data frame.  Hadley Wickham, the author of plyr, calls this the Split-Apply-Combine strategy.

In this case, ddply() takes the original data frame as the first argument, a formula that consists of the variables used to make the groupings (more about this below) as the second argument, the summarize() function (without the parentheses) as the third argument, and then the name of a new variable set equal to a function that computes a summary (length() of the fish length variable in this case).  In this case, the original data frame will be split into groups based on unique combinations of the net and species variables (note that the eff(ort) and temp(erature) variables are not unique from the net variable so they will be repeated with net in the final data frame).

# make some toy data
lens                    eff=rep(c(1,2,2),c(7,5,6)),

# now turn it into catch data
catch1                 summarize,catch=length(tl))

A common problem with this type of data is that mean catch per net will not be computed properly because some species were not captured in some nets, but no zero for those species is entered for those nets.  The addZeroCatch() function in the FSA package can be used to automatically (though, not quickly) enter these zeroes.  This function requires the data frame with catches as the first argument, the name of the variable that identifies the net as the second argument, the name of the variable that identifies the species as the third argument, and a vector of names of variables that should be set to zero in the zerovar= argument.  This process is illustrated below.

# now add zeroes where needed
catch2                        zerovar="catch")
# check it out -- sorted by net then species

Now, for example, the mean and SD of catch-per-unit-effort (CPE) per species can be computed.

# illustrate compute mean/sd CPE
catch2$cpe ddply(catch2,~species,

Obviously, this is a toy example, but it can be scaled up to larger projects.

Filed under: Fisheries Science, R Tagged: Data Manipulation, plyr, R

To leave a comment for the author, please follow the link and comment on their blog: fishR » R. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)