Getting taxonomic names downstream
[This article was first published on Recology - R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
It can be a pain in the ass to get taxonomic names. For example, I sometimes need to get all the Class names for a set of species. This is a relatively easy problem using the ITIS API (example below).
The much harder problem is getting all the taxonomic names downstream. ITIS doesn’t provide an API method for this – well, they do (getHirerachyDownFromTSN
), but it only provides direct children (e.g., the genera within a tribe – but it won’t give all the species within each genus).
So in the taxize
package, we wrote a function called downstream
, which allows you to get taxonomic names to any downstream point, e.g.:
- get all Classes within Animalia,
- get all Species within a Family
- etc.
Install packages. You can get other packages from CRAN, but taxize is only on GitHub for now.
1 # install_github('ritis', 'ropensci') # uncomment if not already installed
2 # install_github('taxize_', 'ropensci') # uncomment if not already
3 # installed
4 library(ritis)
5 library(taxize)
Get upstream taxonomic names.
1 # Search for a TSN by scientific name
2 df <- searchbyscientificname("Tardigrada")
3 tsn <- df[df$combinedname %in% "Tardigrada", "tsn"]
4
5 # Get just one immediate higher taxonomic name
6 gethierarchyupfromtsn(tsn = tsn)
parentName parentTsn rankName taxonName tsn
1 Animalia 202423 Phylum Tardigrada 155166
1 # Get full hierarchy upstream from TSN
2 getfullhierarchyfromtsn(tsn = tsn)
parentName parentTsn rankName taxonName tsn
1 Kingdom Animalia 202423
2 Animalia 202423 Phylum Tardigrada 155166
3 Tardigrada 155166 Class Eutardigrada 155362
4 Tardigrada 155166 Class Heterotardigrada 155167
5 Tardigrada 155166 Class Mesotardigrada 155358
Get taxonomc names downstream.
1 # Get genera downstream fromthe Class Bangiophyceae
2 downstream(846509, "Genus")
tsn parentName parentTsn taxonName rankId rankName
1 11531 Bangiaceae 11530 Bangia 180 Genus
2 11540 Bangiaceae 11530 Porphyra 180 Genus
3 11577 Bangiaceae 11530 Porphyrella 180 Genus
4 11580 Bangiaceae 11530 Conchocelis 180 Genus
1 # Get families downstream from Acridoidea
2 downstream(650497, "Family")
tsn parentName parentTsn taxonName rankId rankName
1 102195 Acridoidea 650497 Acrididae 140 Family
2 650502 Acridoidea 650497 Romaleidae 140 Family
3 657472 Acridoidea 650497 Charilaidae 140 Family
4 657473 Acridoidea 650497 Lathiceridae 140 Family
5 657474 Acridoidea 650497 Lentulidae 140 Family
6 657475 Acridoidea 650497 Lithidiidae 140 Family
7 657476 Acridoidea 650497 Ommexechidae 140 Family
8 657477 Acridoidea 650497 Pamphagidae 140 Family
9 657478 Acridoidea 650497 Pyrgacrididae 140 Family
10 657479 Acridoidea 650497 Tristiridae 140 Family
11 657492 Acridoidea 650497 Dericorythidae 140 Family
1 # Get species downstream from Ursus
2 downstream(180541, "Species")
tsn parentName parentTsn taxonName rankId rankName
1 180542 Ursus 180541 Ursus maritimus 220 Species
2 180543 Ursus 180541 Ursus arctos 220 Species
3 180544 Ursus 180541 Ursus americanus 220 Species
4 621850 Ursus 180541 Ursus thibetanus 220 Species
Get the .Rmd file used to create this post at my github account - or .md file.
Written in Markdown, with help from knitr.
To leave a comment for the author, please follow the link and comment on their blog: Recology - R.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.