AntWeb – programmatic interface to ant biodiversity data

February 18, 2014
By

(This article was first published on rOpenSci Blog - R, and kindly contributed to R-bloggers)

Data on more than 10,000 species of ants recorded worldwide are available through from California Academy of Sciences' AntWeb, a repository that boasts a wealth of natural history data, digital images, and specimen records on ant species from a large community of museum curators.

Digging through some of the earliest announcements of AntWeb, I came across a Nature News piece titled "Mashups mix data into global service" from January 2006. The article contains this great quote from Roderic Page "If you could pool data from every museum or lab in the world, you could do amazing things". The article also says "So far, only researchers with advanced programming skills, working in fields organized enough to have data online and tagged appropriately, have been able to do this." In many ways this really is motivation for why we develop interfaces to these rich data repositories. Our express intent is to facilitate researchers explore amazing opportunities that lie within such data by lowering techinical barriers to use. Right on the heels of our most recent package (ecoengine), we are now happy to first release of an interface to AntWeb.

A stable version of our R package AntWeb is now available from CRAN. The API currently does not require a key for access but larger requests will be throttled on the server side. It is worth noting that much of these same data are also ported through the Global Biodiversity Information Facility and accessible through our gbif package. This package provides a more direct interface to more of the ant specific natural history data.

Installing the package

A stable version of the package (0.5) is now available on CRAN.

install.packages("AntWeb")  

or you can install the latest development version (the master branch is also always stable & deployable and most up-to-date. Current version is 0.5.3 at the time of this writing).

library(devtools)  
install_github("ropensci/AntWeb")

Searching through the database

As with most of our packages, there are several ways to search through an API. In the case of AntWeb, you can search by a genus or full species name or by other taxonomic ranks like sub-phylum.

Data on ants

To obtain data on any taxonomic group, you can make a request using the aw_data() function. It's possible to search easily by a taxonomic rank (e.g. a genus) or by passing a complete scientific name.

Searching by Genus

library(AntWeb)
# To get data on an ant genus found widely through Central and South America
data_genus_only <- aw_data(genus = "acanthognathus")
leaf_cutter_ants  <- aw_data(genus = "acromyrmex")
unique(leaf_cutter_ants$meta.species)
#>  [1] "(indet)"      "alw01"        "alw02"        "alw03"       
#>  [5] "alw04"        "ambiguus"     "aspersus"     "asperus"     
#>  [9] "balzani"      "coronatus"    "crassispinus" "disciger"    
#> [13] "echinatior"   "evenkul"      "fracticornis" "heyeri"      
#> [17] "hispidus"     "hystrix"      "indet"        "landolti"    
#> [21] "laticeps"     "lobicornis"   "lundi"        "lundii"      
#> [25] "moelleri"     "muticinoda"   "niger"        "nigrosetosus"
#> [29] "nobilis"      "octospinosus" "pubescens"    "pulvereus"   
#> [33] "rugosus"      "santschii"    "silvestrii"   "striatus"    
#> [37] "subterraneus" "versicolor"   "volcanus"

Searching by species

# You can request data on any particular species
acanthognathus_df <- aw_data(scientific_name = "acanthognathus brevicornis")
head(acanthognathus_df)
#>            code                           taxon_name      tribe  subfamily
#> 1 casent0280684 myrmicinaeacanthognathus brevicornis   dacetini myrmicinae
#> 2 casent0637708 myrmicinaeacanthognathus brevicornis dacetonini myrmicinae
#>            genus     species  country                 localityname
#> 1 acanthognathus brevicornis Colombia Las Naranjas near Josc Maria
#> 2 acanthognathus brevicornis     Peru    Tambopata Research Center
#>   localitycode collectioncode biogeographicregion       last_modified
#> 1   Josc Maria      ANTC19540           Neotropic 2014-02-18 12:57:40
#> 2    JTL060117  TRC-S06-R1C04           Neotropic 2014-02-17 16:00:34
#>               ownedby collectedby  caste access_group locatedat    medium
#> 1 BMNH, London, U. K.  D. Jackson     1w            1      BMNH       pin
#> 2                <NA>   D. Feener worker            2      JTLC dry mount
#>   access_login  specimennotes             created     family
#> 1           23 BMNH(E)1017559 2014-02-18 12:57:40 formicidae
#> 2            2           <NA> 2014-02-17 16:00:34 formicidae
#>   datecollectedstart datecollectedstartstr kingdom_name phylum_name
#> 1         1977-08-08            8 Aug 1977     animalia  arthropoda
#> 2         2001-11-01            1 Nov 2001     animalia  arthropoda
#>   class_name  order_name image_count          adm1 decimal_latitude
#> 1    insecta hymenoptera           5          <NA>             <NA>
#> 2    insecta hymenoptera           0 Madre de Dios        -13.14142
#>   decimal_longitude                  habitat  method determinedby
#> 1              <NA>                     <NA>    <NA>         <NA>
#> 2           -69.623 Mixed terra firme forest winkler   J. Longino
#>   elevation latlonmaxerror          microhabitat datedetermined
#> 1      <NA>           <NA>                  <NA>           <NA>
#> 2       252           100m ex sifted leaf litter     2013-09-12
#>   datedeterminedstr
#> 1              <NA>
#> 2       12 Sep 2013
# You can also limit queries to observation records that have been geoferenced
acanthognathus_df_geo <- aw_data(genus = "acanthognathus", species = "brevicornis", georeferenced = TRUE)

It's also possible to search for records around any location by specifying a search radius.

data_by_loc <- aw_coords(coord = "37.76,-122.45", r = 2)
# This will search for data on a 2 km radius around that latitude/longitude

Image data

Most specimens in the database have images associated with them. These include high, medium, and low resolution version of the head, dorsal side, full profile, and the specimen label. For example we can retrieve data on a specimen of Ecitoninaeeciton burchellii with the following call:

# Data and images for Ecitoninaeeciton burchellii
eb <- aw_code("casent0003205")
eb$image_data$high[[2]]
#> [1] "http://www.antweb.org/images/casent0003205/casent0003205_h_1_high.jpg"

If you're primarily interested in ant images and would like to keep up with recent additions to the database, you can also use the aw_images function. This function takes two arguments: since, the number of days to search backward, and a type. Possible options for type are h for head, d for dorsal, p for profile, and l for label. If a type is not specified, all available images are retrieved.

# Retrieve only dorsal images for the last five days
aw_images(since = 5, type = "d")

It's also possible to retrieve unique lists of any taxonomic rank using the aw_unique function.

subfamily_list <- aw_unique(rank = "subfamily")
nrow(subfamily_list)
#> [1] 69
head(subfamily_list)  
#>       subfamily
#> 1      (apidae)
#> 2  (bethylidae)
#> 3  (braconidae)
#> 4   (cynipidae)
#> 5  (diapriidae)
#> 6 (diaspididae)
genus_list <- aw_unique(rank = "genus")
nrow(genus_list)
#> [1] 470
head(genus_list)  
#>             genus
#> 1    (aenictinae)
#> 2 (amblyoponinae)
#> 3        (apidae)
#> 4        (attini)
#> 5  (basicerotini)
#> 6    (bethylidae)
species_list <- aw_unique(rank = "species")
nrow(species_list)
#> [1] 10480
head(species_list)  
#>          species
#> 1 (basicerotini)
#> 2        (indet)
#> 3       (indet.)
#> 4    (orizabanum
#> 5     abbreviata
#> 6     abdelazizi

If you work with existing specimens, you can also query directly by a specimen ID.

asphinctanilloides_amazona <- aw_code(code = "casent0104669")
# This will return a list with a metadata data.frame and a image data.frame

If you have a multiple specimen IDs, as is often the case when working with research data, you can get data on all of them at the same time. The function automatically retuns NULL values when no data are found and you can have these removed using plyr::compact (this happens automatically when you use a function call like ldply.)

specimens <- c("casent0908629", "casent0908650", "casent0908637")
results <- lapply(specimens, function(x) aw_code(x))
names(results) <- specimens
length(results)
#> [1] 3

Mapping ant specimen data

As with the previous ecoengine package, you can also visualize location data for any set of species. Adding georeferenced = TRUE to a data retrieval call will filter out any data points without location information. Once retrieved the data are mapped with the open source Leaflet.js and pushed to your default browser. Maps and associated geoJSON files are also saved to a location specified (or defaults to your /tmp folder). This feature is only available on the development version on GitHub (0.5.2 or greater; see above on how to install) and will be available from CRAN in version 0.6

acd <- aw_data(genus = "acanthognathus")
aw_map(acd)

Distribution of long trap-jaw ants in Central and South America

Integration with the rest of our biodiversity suite

Our newest package on CRAN, spocc (Species Occurrence Data), currently in review at CRAN, integrates AntWeb among other sources. More details on spocc in our next blog post.

As always please send suggestions, bug reports, and ideas related to the AntWeb R package directly to our repo.

To leave a comment for the author, please follow the link and comment on his blog: rOpenSci Blog - R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.