Tips & Tricks 5: Extracting Classifiers Using Substring

November 20, 2014
By

(This article was first published on geomorph, and kindly contributed to R-bloggers)

Today’s exercise in another easy one, and is inspired by a question from Ariel Marcy of University of Queensland.

Exercise 5 – How to extract classifiers from names of specimens.

Well-organised morphometricians will have a consistent naming system for their specimens, such that information about the species and ID are included in the image name, and perhaps other information such as locality, sex, side (for asymmetry studies), replicate etc. (this in fact is recommended for new users of morphometrics. See this blog post for more details on starting up your morphometrics study). Although beware that the maximum length of path names for files on Windows machines is 260 characters.

E.g.,

A_species_12345_F_Australia.jpg
Here, the name of the specimen is the binomial species name, followed by it’s ID, the sex of the specimen and then the locality.

Here is a simply workflow for converting this information into a table so that these classifiers can be used in geomorph functions.

Assuming you have a 3D array of the coordinate data, mydata, as read in using readland.tps()readland.nts() or any other reasonable way.

categories <- strsplit(dimnames(mydata)[[3]], “_”)  # separates the specimen names by underscore

Strspilt() is a very useful base function! If you are using a space, or period, just replace the second option with ” ” or “.”. However it’s very good practise to use underscores since spaces can cause REALLY irritating issues (particularly with read.nexus() and other phylogeny reading functions)

classifiers <- matrix(unlist(categories), ncol=5, byrow=T)  #reads list into matrix
classifiers <- cbind(dimnames(mydata)[[3]], classifiers) # adds the specimen ID to the first column of the table
colnames(classifiers) <- c(“fileID”, “Genus”, “Species”, “ID”, “Sex”, “Locality”)#rename the column headings

classifiers <- as.data.frame(classifiers)   #converts to data frame so can index using \$

Simple!

Emma

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...