Tips & Tricks 5: Extracting Classifiers Using Substring

[This article was first published on geomorph, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Today’s exercise in another easy one, and is inspired by a question from Ariel Marcy of University of Queensland.

Exercise 5 – How to extract classifiers from names of specimens.

Well-organised morphometricians will have a consistent naming system for their specimens, such that information about the species and ID are included in the image name, and perhaps other information such as locality, sex, side (for asymmetry studies), replicate etc. (this in fact is recommended for new users of morphometrics. See this blog post for more details on starting up your morphometrics study). Although beware that the maximum length of path names for files on Windows machines is 260 characters.

E.g.,

A_species_12345_F_Australia.jpg
Here, the name of the specimen is the binomial species name, followed by it’s ID, the sex of the specimen and then the locality.

Here is a simply workflow for converting this information into a table so that these classifiers can be used in geomorph functions.


Assuming you have a 3D array of the coordinate data, mydata, as read in using readland.tps()readland.nts() or any other reasonable way.

categories <- strsplit(dimnames(mydata)[[3]], “_”)  # separates the specimen names by underscore

Strspilt() is a very useful base function! If you are using a space, or period, just replace the second option with ” ” or “.”. However it’s very good practise to use underscores since spaces can cause REALLY irritating issues (particularly with read.nexus() and other phylogeny reading functions)

classifiers <- matrix(unlist(categories), ncol=5, byrow=T)  #reads list into matrix
classifiers <- cbind(dimnames(mydata)[[3]], classifiers) # adds the specimen ID to the first column of the table
colnames(classifiers) <- c("fileID", "Genus", "Species", "ID", "Sex", "Locality")#rename the column headings

classifiers <- as.data.frame(classifiers)   #converts to data frame so can index using $

Simple! 

Emma

To leave a comment for the author, please follow the link and comment on their blog: geomorph.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)