Modified lencat() — Increased Flexibility with dplyr
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
One of the first functions in the FSA package was lencat()
, which served me well over the years. However, I have been bothered by the use of a formula and data=
to identify a single column to be “transformed” and that an “automatic” determination of startcat=
was not coded. Additionally, lencat()
did not work well with dplyr, which I recently discovered (see my introduction). Thus, I have reworked lencat()
in the latest FSA to handle these issues while maintaining the original functionality.
The modified lencat()
behaves slightly differently depending on how the user supplies the fish lengths. If the user provides a formula and data=
, then lencat()
will return a data.frame with the new variable appended. This is the exact same behavior as the original lencat()
. However, if the user supplies a vector as the first argument, then lencat()
will now return a single vector of the length categorization values. Additionally, in both uses, the user can leave startcat=
blank and a reasonable starting value (i.e., a value just below the minimum observed value that “makes sense” given w=
) will be used.
The new functionality of lencat()
is demonstrated below. First, I loaded the FSA and dplyr packages.
library(FSA) library(dplyr)
Smallmouth Bass length data from a lake in Minnesota will be used and for the sake of simplicity, all variables related to measurements on the scales of the fish (i.e., all variables containing “anu” and “radcap”) and the species and lake (because they were constant at “SMB” and “WB”) were removed.
data(SMBassWB) smb1 <- SMBassWB %.% select(-contains("anu"),-radcap,-species,-lake) smb3 <- smb2 <- smb1 # copies for later use str(smb1) ## 'data.frame': 445 obs. of 5 variables: ## $ gear : Factor w/ 2 levels "E","T": 1 1 1 1 1 1 1 1 1 1 ... ## $ yearcap: int 1988 1988 1988 1988 1988 1988 1989 1990 1990 1990 ... ## $ fish : int 5 3 2 4 6 7 50 482 768 428 ... ## $ agecap : int 1 1 1 1 1 1 1 1 1 1 ... ## $ lencap : int 71 64 57 68 72 80 55 75 75 71 ...
Note that the length measurements are in the lencap
variable.
Introductory Example of New Functionality
As a foundational example, lencat()
is used below to create a new vector of 10-mm length categories for the lengths. Only the first 12 length-categories are shown (using head()
) to save space.
tmp <- lencat(smb1$lencap,w=10) head(tmp,n=12) ## [1] 70 60 50 60 70 80 50 70 70 70 100 50
These length categories can be added to the data frame as follows.
smb1$LCat10 <- lencat(smb1$lencap,w=10) head(smb1) ## gear yearcap fish agecap lencap LCat10 ## 1 E 1988 5 1 71 70 ## 2 E 1988 3 1 64 60 ## 3 E 1988 2 1 57 50 ## 4 E 1988 4 1 68 60 ## 5 E 1988 6 1 72 70 ## 6 E 1988 7 1 80 80
The same variable can be added using mutate()
from dplyr as follows.
smb1 <- mutate(smb1,LCat10=lencat(lencap,w=10)) head(smb1) ## gear yearcap fish agecap lencap LCat10 ## 1 E 1988 5 1 71 70 ## 2 E 1988 3 1 64 60 ## 3 E 1988 2 1 57 50 ## 4 E 1988 4 1 68 60 ## 5 E 1988 6 1 72 70 ## 6 E 1988 7 1 80 80
The advantage of using dplyr in this way is that you can string together multiple data manipulations. For example, one could create the variable as above but then order the rows of the data.frame in ascending length category values as follows.
smb1 <- smb1 %.% mutate(LCat10=lencat(lencap,w=10)) %.% arrange(LCat10) head(smb1) ## gear yearcap fish agecap lencap LCat10 ## 1 E 1988 2 1 57 50 ## 2 E 1989 50 1 55 50 ## 3 T 1988 2 1 57 50 ## 4 E 1988 3 1 64 60 ## 5 E 1988 4 1 68 60 ## 6 T 1988 3 1 64 60
Extended Example of New Functionality
In the examples above, the 10-mm length categories were created without the use of startcat=
. The lencat()
function found the first even 10-mm length category (50) below the minimum observed value (55) and created length categories from that. One can still set the value for the starting category with startcat=
as follows.
smb1 <- smb1 %.% mutate(LCat10=lencat(lencap,w=10,startcat=55)) %.% arrange(LCat10) head(smb1) ## gear yearcap fish agecap lencap LCat10 ## 1 E 1988 2 1 57 55 ## 2 E 1989 50 1 55 55 ## 3 T 1988 2 1 57 55 ## 4 E 1988 3 1 64 55 ## 5 T 1988 3 1 64 55 ## 6 E 1988 4 1 68 65
However, the automatic startcat=
seems to be a useful feature for a wide variety of different values of w=
as demonstrated below.
smb1 <- smb1 %.% mutate(LCat5=lencat(lencap,w=5)) %.% mutate(LCat10=lencat(lencap,w=10)) %.% mutate(LCat25=lencat(lencap,w=25)) %.% arrange(lencap) head(smb1,n=10) ## gear yearcap fish agecap lencap LCat10 LCat5 LCat25 ## 1 E 1989 50 1 55 50 55 50 ## 2 E 1988 2 1 57 50 55 50 ## 3 T 1988 2 1 57 50 55 50 ## 4 E 1988 3 1 64 60 60 50 ## 5 T 1988 3 1 64 60 60 50 ## 6 E 1988 4 1 68 60 65 50 ## 7 T 1988 4 1 68 60 65 50 ## 8 E 1988 5 1 71 70 70 50 ## 9 E 1990 428 1 71 70 70 50 ## 10 T 1988 5 1 71 70 70 50
The default type returned by lencat()
is numeric. This can result in “missing categories” in length frequency distributions. For example, the length frequency distribution for 25-mm length categories shown below is missing the 375- and 400-mm categories.
xtabs(~LCat25,data=smb1) ## LCat25 ## 50 75 100 125 150 175 200 225 250 275 300 325 350 425 ## 12 14 52 58 60 37 51 45 48 50 9 6 2 1
The problem with missing length categories can be corrected by having the values returned as a factor rather than a numeric. The return values are forced to be a factor by including as.fact=TRUE
to lencat()
as shown below.
smb1 <- smb1 %.% mutate(LCat25f=lencat(lencap,w=25,as.fact=TRUE)) xtabs(~LCat25f,data=smb1) ## LCat25f ## 50 75 100 125 150 175 200 225 250 275 300 325 350 375 400 425 ## 12 14 52 58 60 37 51 45 48 50 9 6 2 0 0 1
Finally, one can still use breaks=
to set specific and potentially unequally-spaced values for the length categories. The example below finds the Gabelhouse five-cell length categories for Smallmouth Bass and then creates two new variables from these values – one that will show the length values and one that shows the category name values. To further exhibit the use of dplyr I also removed (i.e., use filter()
) all fish that were less than “stock” size (i.e., the zero
category).
( brks <- psdVal("Smallmouth Bass",units="mm") ) ## zero stock quality preferred memorable trophy ## 0 180 280 350 430 510 smb2 <- smb2 %.% mutate(LCatPSD1=lencat(lencap,breaks=brks)) %.% mutate(LCatPSD2=lencat(lencap,breaks=brks,use.names=TRUE)) %.% arrange(lencap) %.% filter(LCatPSD2 != "zero") head(smb2,n=10) ## gear yearcap fish agecap lencap LCatPSD1 LCatPSD2 ## 1 E 1990 415 3 180 180 stock ## 2 E 1988 28 5 180 180 stock ## 3 E 1988 29 5 180 180 stock ## 4 T 1988 29 5 180 180 stock ## 5 T 1988 28 5 180 180 stock ## 6 E 1990 700 3 182 180 stock ## 7 T 1989 40 3 183 180 stock ## 8 T 1989 98 2 187 180 stock ## 9 E 1990 760 3 187 180 stock ## 10 E 1990 399 4 187 180 stock xtabs(~LCatPSD1,data=smb2) ## LCatPSD1 ## 180 280 350 430 ## 188 54 2 1 xtabs(~LCatPSD2,data=smb2) ## LCatPSD2 ## zero stock quality preferred memorable trophy ## 0 188 54 2 1 0
Note that the categories without any fish are still shown in the last table. This can be adjusted with droplevels()
as follows.
smb2 <- droplevels(smb2) xtabs(~LCatPSD2,data=smb2) ## LCatPSD2 ## stock quality preferred memorable ## 188 54 2 1
The Old Functionality Is Still There
The “old” functional of lencat()
still exists so that your old code with lencat()
is not broken (with the minor exception that use.catnames=
is now use.names=
).
smb3 <- lencat(~lencap,data=smb3,w=10) smb3 <- lencat(~lencap,data=smb3,w=25,vname="LenCat25") smb3 <- lencat(~lencap,data=smb3,breaks=psdVal("Smallmouth Bass"), vname="LenPsd") smb3 <- lencat(~lencap,data=smb3,breaks=psdVal("Smallmouth Bass"), vname="LenPsd2",use.names=TRUE,drop.levels=TRUE) head(smb3,n=10) ## gear yearcap fish agecap lencap LCat LenCat25 LenPsd LenPsd2 ## 1 E 1988 5 1 71 70 50 0 zero ## 2 E 1988 3 1 64 60 50 0 zero ## 3 E 1988 2 1 57 50 50 0 zero ## 4 E 1988 4 1 68 60 50 0 zero ## 5 E 1988 6 1 72 70 50 0 zero ## 6 E 1988 7 1 80 80 75 0 zero ## 7 E 1989 50 1 55 50 50 0 zero ## 8 E 1990 482 1 75 70 75 0 zero ## 9 E 1990 768 1 75 70 75 0 zero ## 10 E 1990 428 1 71 70 50 0 zero
This functionality is particularly useful when you want to create a new data.frame from the old data.frame but with the appended length category variable.
Filed under: Fisheries Science, R Tagged: Data, Manipulation, R
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.