{hagr} Linnaean Classification

[This article was first published on R | datawookie, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I’ve taken another look at the {hagr} data, which I wrote about previously. This time I’m focusing on the hierarchy of creatures.

Taxonomic Rank

The Linnaean Taxonomy is a hierarchical classification system for organisms devised by Carl Linnaeus. An organism is assigned to the following levels in the hierarchy (in increasing order or granularity):

  • domain
  • kingdom
  • phylum
  • class
  • order
  • family
  • genus and
  • species.

The relative level of a group of organisms in this hierarchy determines its taxonomic rank.

? The Linnaean Taxonomy was developed way before the idea of evolution arose. As a consequence, despite being a useful framework for classifying organisms, it does not take into account evolutionary relationships.

Taxonomic ranks. Image from Wikipedia.

Let’s take a look at the classification data in the {hagr} package.


Linnaean Taxonomic Levels

We’ll start at the top level, domain.

age %>% count(domain, sort = TRUE)
# A tibble: 1 x 2
  domain      n
  <chr>   <int>
1 Eukarya  4219

There’s only one domain, Eukarya, present. So we don’t have any information on Bacteria or Archaea (single-celled organisms).

If we dig down one level then we find that the Eukarya domain consists of three kingdoms: Animalia, Fungi and Plantae. There’s actually a fourth kingdom in Eukarya, Protista, however there’s no data for it in age.

age %>% count(kingdom, sort = TRUE)
# A tibble: 3 x 2
  kingdom      n
  <fct>    <int>
1 Animalia  4215
2 Fungi        3
3 Plantae      1

It’s clear that Animalia is the dominant kingdom, so let’s focus on that exclusively.

animalia <- age %>% filter(kingdom == "Animalia")

The next level in the hierarchy is phylum.

animalia %>% count(phylum, sort = TRUE)
# A tibble: 7 x 2
  phylum            n
  <fct>         <int>
1 Chordata       4200
2 Arthropoda        8
3 Echinodermata     2
4 Porifera          2
5 Cnidaria          1
6 Mollusca          1
7 Nematoda          1

It appears that Chordata is the dominant phylum in the data, so let’s further narrow our attention.

chordata <- animalia %>% filter(phylum == "Chordata")

Now let’s drill all the way down to genus.

chordata %>% count(class, order, family, genus, sort = TRUE)
# A tibble: 2,035 x 5
   class          order             family           genus            n
   <fct>          <fct>             <fct>            <fct>        <int>
 1 Teleostei      Scorpaeniformes   Scorpaenidae     Sebastes        49
 2 Teleostei      Perciformes       Percidae         Etheostoma      35
 3 Aves           Passeriformes     Parulidae        Setophaga       23
 4 Teleostei      Cypriniformes     Cyprinidae       Notropis        23
 5 Mammalia       Chiroptera        Vespertilionidae Myotis          21
 6 Reptilia       Squamata          Viperidae        Crotalus        19
 7 Teleostei      Perciformes       Lutjanidae       Lutjanus        18
 8 Aves           Psittaciformes    Psittacidae      Amazona         17
 9 Chondrichthyes Carcharhiniformes Carcharhinidae   Carcharhinus    17
10 Aves           Falconiformes     Falconidae       Falco           15
# … with 2,025 more rows

Adding in species takes you to the most granular level in the hierarchy.

chordata %>% select(class, order, family, genus, species, common_name)
# A tibble: 4,200 x 6
   class    order family         genus    species    common_name            
   <fct>    <fct> <fct>          <fct>    <fct>      <chr>                  
 1 Amphibia Anura Bombinatoridae Bombina  bombina    Firebelly toad         
 2 Amphibia Anura Bombinatoridae Bombina  orientalis Oriental firebelly toad
 3 Amphibia Anura Bombinatoridae Bombina  variegata  Yellow-bellied toad    
 4 Amphibia Anura Bufonidae      Anaxyrus americanus American toad          
 5 Amphibia Anura Bufonidae      Anaxyrus boreas     Western toad           
 6 Amphibia Anura Bufonidae      Anaxyrus canorus    Yosemite toad          
 7 Amphibia Anura Bufonidae      Anaxyrus cognatus   Great plains toad      
 8 Amphibia Anura Bufonidae      Anaxyrus debilis    Green toad             
 9 Amphibia Anura Bufonidae      Anaxyrus hemiophrys Canadian toad          
10 Amphibia Anura Bufonidae      Anaxyrus punctatus  Red-spotted toad       
# … with 4,190 more rows

? The combination of genus and species gives the binomial scientific name for organisms. For example, the Killer Whale is Orcinus orca.

age %>%
  filter(str_detect(common_name, "^(Killer|Blue|Sperm) whale$")) %>%
# A tibble: 3 x 6
  class    order   family          genus        species       common_name 
  <fct>    <fct>   <fct>           <fct>        <fct>         <chr>       
1 Mammalia Cetacea Balaenopteridae Balaenoptera musculus      Blue whale  
2 Mammalia Cetacea Delphinidae     Orcinus      orca          Killer whale
3 Mammalia Cetacea Physeteridae    Physeter     macrocephalus Sperm whale 

Growing a Tree

We’ll use {ggtree} to construct a phylogenetic tree from domain down to order.

The dominance of the Chordata phylum in the data is readily apparent! It’d be nice to include more levels in this tree, but it gets very big and rather messy.

There’s such a wealth of cool information in this dataset. Really indebted to the the Human Ageing Genomic Resources project for putting it together and generously sharing it.

To leave a comment for the author, please follow the link and comment on their blog: R | datawookie.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)