Tips & Tricks 2: Deleting Specimens and Landmarks

January 17, 2014
By

(This article was first published on geomorph, and kindly contributed to R-bloggers)

Today’s exercise is nice and simple, and allows you to get used to manipulating datasets in R.

Exercise 2 – How to delete specimens or landmarks from a dataset of morphometric data (shape data).

When you load your shape data into R with Geomorph (with the read… functions), it will typically be in a 3D array format (class array). Recall that a 3D array is akin to having a p landmarks by k dimension matrix for each individual specimen arranged on separate filing cards, and these are stacked up into n dimensions.
For example:
class(mydata)
[1] array
dim(mydata)
[1] 12 2 40 # This means there are 12 landmarks, in 2 dimensions, and 40 specimens.
Imagine 40 filing cards with a 12 by 2 matrix on them. You get the idea.

Some analyses in geomorph require the dataset to be a 2D array, which is n rows and p*k columns.
mydata.2D <- two.d.array(mydata)
class(mydata.2D)
[1] matrix
dim(mydata.2D)
[1] 40 24 # This means there are 40 specimens and 24 variables (12 * 2).

R has a wonderful built in functionality to access parts of these arrays and matrices. This is why we don’t write functions in Geomorph to add/delete specimens, or landmarks, because CRAN did it for us!
I’ll give you a quick overview of indexing as it applies to deleting specimens. (You can read more about indexing here (section 3.4).)

Take the 2D array, mydata.2D. Say specimen 7 is an outlier, too juvenile perhaps, or damaged.
mydata.2D.new <- mydata.2d[-7,] # here we remove row number 7.

Or if you know specimens 7 through to 10 are no good to use:
mydata.2D.new <- mydata.2d[-(7:10),]

For several specimens, or those out of sequence:
omit <- c(5,7, 11, 30) # make a vector of the specimen numbers to delete
mydata.2D.new <- mydata.2d[-omit,]

For the 3D array, it is equally simple. Using the same specimen examples as above:
mydata.new <- mydata[,,-7]
mydata.new <- mydata[,,-(7:10)]
mydata.new <- mydata[,,-omit] # note this time, we are accessing the third position in the array, the “card”, whereas above we were accessing the row of the matrix.

This same technique can be applied to landmarks.

In the 2D array: Landmark 5 is missing from some specimens. So let’s delete it.
Since the landmark data are arranged x1 y1 x2 y2 … the landmark we want to delete is in:
delete <- c((5*2), (5*2+1)) # position 10 and 11
mydata.2D.new <- mydata.2d[,-delete)]
dim(mydata.2D.new)
[1] 40 22

Note that this will change the numbering of the landmarks.

In a 3D array, this is simpler:
mydata.new <- mydata[-5,,)] # this removes landmark 5 from each “card”
dim(mydata.new)
[1] 11  2 40

Getting adept with indexing is a valuable skill with R. Have fun deleting!

Emma

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...