Taxonomy with R: Exploring the Taxize-Package

December 16, 2012
By

[This article was first published on theBioBucket*, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

https://i0.wp.com/upload.wikimedia.org/wikipedia/commons/6/68/Ernst_Haeckel_-_Tree_of_Life.jpg?resize=110%2C170First off, I’d really like to give a shout-out to the brave people who have created and maintain this great package – the fame is yours!

So, while exploring the capabilities of the package some issues with the ITIS-Server arose and with large datasets things weren’t working out quite well for me.
I then switched to the NCBI API and saw that the result is much better here (way quicker, on first glance also a higher coverage).
At the time there is no taxize-function that will pull taxonomic details from a classification returned by NCBI, that’s why I plugged together a little wrapper – see here:

# some species data:
spec <- data.frame("Species" = I(c("Bryum schleicheri", "Bryum capillare", "Bryum argentum", "Escherichia coli", "Glis glis")))
spl <- strsplit(spec$Species, " ")
spec$Genus <- as.character(sapply(spl, "[[", 1))

# for pulling taxonomic details we'd best submit higher rank taxons
# in this case Genera. Then we'll submit Genus Bryum only once and
# save some computation time (might be an issue if you deal
# with large datasets..)

gen_uniq <- unique(spec$Genus)

# function for pulling classification details ("phylum" in this case)
get_sys_level <- function(x){ require(taxize)
a <- classification(get_uid(x))
y <- data.frame(a[[1]]) # if there are multiple results, take the first..
z <- tryCatch(as.character(y[which(y[,2] == "phylum"), 1]), # in case of any other errors put NA
error = function(e) NA)
z <- ifelse(length(z) != 0, z, NA) # if the taxonomic detail is not covered return NA
return(data.frame(Taxon = x, Syslevel = z))
}

# call function and rbind the returned values
result <- do.call(rbind, lapply(gen_uniq, get_sys_level))
print(result)
# Taxon Syslevel
# 1 Bryum Streptophyta
# 2 Escherichia Proteobacteria
# 3 Glis Chordata

# now merge back to the original data frame
spec_new <- merge(spec, result, by.x = "Genus", by.y = "Taxon")
print(spec_new)
# Genus Species Syslevel
# 1 Bryum Bryum schleicheri Streptophyta
# 2 Bryum Bryum capillare Streptophyta
# 3 Bryum Bryum argentum Streptophyta
# 4 Escherichia Escherichia coli Proteobacteria
# 5 Glis Glis glis Chordata
#

To leave a comment for the author, please follow the link and comment on their blog: theBioBucket*.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)