Tips & Tricks 5: Extracting Classifiers Using Substring

November 20, 2014

(This article was first published on geomorph, and kindly contributed to R-bloggers)

Today’s exercise in another easy one, and is inspired by a question from Ariel Marcy of University of Queensland.

Exercise 5 – How to extract classifiers from names of specimens.

Well-organised morphometricians will have a consistent naming system for their specimens, such that information about the species and ID are included in the image name, and perhaps other information such as locality, sex, side (for asymmetry studies), replicate etc. (this in fact is recommended for new users of morphometrics. See this blog post for more details on starting up your morphometrics study). Although beware that the maximum length of path names for files on Windows machines is 260 characters.


Here, the name of the specimen is the binomial species name, followed by it’s ID, the sex of the specimen and then the locality.

Here is a simply workflow for converting this information into a table so that these classifiers can be used in geomorph functions.

Assuming you have a 3D array of the coordinate data, mydata, as read in using readland.tps()readland.nts() or any other reasonable way.

categories <- strsplit(dimnames(mydata)[[3]], “_”)  # separates the specimen names by underscore

Strspilt() is a very useful base function! If you are using a space, or period, just replace the second option with ” ” or “.”. However it’s very good practise to use underscores since spaces can cause REALLY irritating issues (particularly with and other phylogeny reading functions)

classifiers <- matrix(unlist(categories), ncol=5, byrow=T)  #reads list into matrix
classifiers <- cbind(dimnames(mydata)[[3]], classifiers) # adds the specimen ID to the first column of the table
colnames(classifiers) <- c(“fileID”, “Genus”, “Species”, “ID”, “Sex”, “Locality”)#rename the column headings

classifiers <-   #converts to data frame so can index using $



To leave a comment for the author, please follow the link and comment on their blog: geomorph. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)