Tips & Tricks 1: Averaging Data By Group
[This article was first published on geomorph, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I thought it would be helpful to new users of Geomorph and R to discuss some tips and tricks for manipulating morphometric datasets in R. So I am starting a series called “Tips & Tricks”. These will be published as I think of them, and assume you have already inputted your data into R using geomorph functions, such as readland.nts, readmulti.nts, readland.tps, read.morphologika, or simple read.csv. Remember that the read… functions in Geomorph turn the morphometric data into a 3D array (like a stack of filing cards; each card representing an individual and having on it a p landmarks by k dimensions matrix of coordinates). Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Exercise 1 – How to average a dataset of morphometric data (shape data) by group.
Lets start with the input objects. We have a 3D array of 30 specimens shape data (p by k coordinate data) called data, and a grouping variable group, which is of class factor, with m levels. We want to average the morphometric data by these m levels.
[Here you should familiarise yourself with the factor functions, such as as.factor() and levels(), see here for more details].
The first thing to do is to change the 3D array data into a 2D array (n by p x k) with geomorph function two.d.array()
x <- two.d.array(data)
I will explain a long-winded way of doing things, which is intuitive but cumbersome. Then I will show a simple one line version that utilises two useful base functions in R, which make life much simpler!
1) Set up the output matrix, which will be filled by a for loop:
p <- dim(data)[1] # the number of landmarks
k <- dim(data)[2] # the dimensions of the coordinates
Y <- array(NA, dim=c(p,k,length(levels(group))) #new empty array to fill
dimnames(Y)[[3]] <- levels(group)# set group levels as new names
Now write a for loop that subsets the 2D array by group and calculates the mean shape using geomorph function mshape():
for (i in 1: length(levels(group))){
grp <- x[which(group==levels(group)[i]),]# makes a subset 2D array with only specimens of the ith group by using which().
foo <- arrayspecs(grp ,p,k, byLand=F) # turn back into an 3D array
Y[,,i] <- mshape(foo) # place into the new 3D array
}
Since mshape() requires a 3D array, the switching between 2D and #D is necessary, but cumbersome. While this procedure is very long-winded, it does allow you to get used to manipulating arrays, and learn how to do procedures by levels of a grouping factor.
But if you look under the hood, you see that mshape is simply taking the means of the rows and columns of the matrix. This leads us to:
2) This for loop can in fact be replaced with one line:
means <- rowsum(x, group)/as.vector(table(group))# rowsum() is a simple base function to get the sum of the rows, while table() is a great function for getting group sizes (group n).
Y <- arrayspecs(means ,dim(data)[1],dim(data)[2], byLand=F)# then all you have to do is put the averaged data back into an 3D array.
Both ways produce the same result. As is often the case with R, ‘there are many ways to skin a cat’! The first example works with functions you likely already know, and allows you to become more proficient with writing your own for loops and searching or subsetting with which(). The second example shows you that looking deeper into R base functions will uncover little gem functions, which make our life much easier!
Enjoy averaging by group!
Emma
To leave a comment for the author, please follow the link and comment on their blog: geomorph.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.