GSoC Proposal 2013: Biodiversity Visualizations using R

[This article was first published on Vijay Barve, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I am applying for Google Summer of Code 2013 with this “Biodiversity Visualizations using R” proposal. I am posting this idea to get feedback and suggestions from Biodiversity Informatics community.

[During next few days I will keep updating this to accommodate suggestions. The example visualizations here are crude examples of the ideas, and need lot of work to convert them into reusable functions.]

Backgrouond

R is increasingly being used in Biodiversity information analysis. There are several R packages like rgbif and rvertnet in rOpenSci suite to query, download and to some extent analyse the data within R workflow. We also have packages like dismo and SDMTools for modelling the data. It will be useful to have a package to quickly visualize biodiversity data. These visualizations would be helpful to understand extent of geographical, taxonomic and temporal coverage, gaps and biases in data.

The proposal is to work on a R package to provide functionality to quickly generate the visualizations of the data set user has gathered or generated.

The functions provided would be for following tasks:

  • Data preparation – The data needs to be converted into suitable format for visualizations and analysis i.e. date format, taxonomic classification and geographical co-ordinates should be in uniform and usable formats.
  • Data summary: Function(s) to quickly summarize the data set telling user number of records, number of records with Lat Long values, Bounding box of Lat Long Values, Date range and so on.
  • Geographic coverage – functions to visualize the data points on maps, density maps at different scales like Country level, Degree grid and so on.
Density of the records worldwide

Density of the records worldwide. Darker color indicates higher density of records.

Temporal coverage of the records

Temporal coverage of the records. Each line represents number of records on that particular day.

  • Taxonomic coverage – functions to visualize the taxonomic coverage of data in Tree Map formats by Number of records per species and number of species covered.
Familywise records

Family wise records present in the data set. (White block indicates records with unassigned family)

  • Completeness analysis – functions to assess and visualize completeness of biodiversity inventory of the region or in other words a measure of how exhaustive is the sampling in the study area [Ref:http://dx.doi.org/10.1111/j.0906-7590.2007.04627.x ]

Mentor(s): Javier Otegui

Data set: The data set used for the sample visualizations here is records published by iNaturalist.org on GBIF data portal. This data set contains Research Grade records (~46K) for all the organisms posted. The details of the data set are available here. The description on GBIF dat postal says “iNaturalist.org is a website where anyone can record their observations from nature. Members record observations for numerous reasons, including participation in citizen science projects, class projects, and personal fulfillment.”

References:

  • Chamberlain, S., & Barve, V. (2012). rvertnet: Search VertNet database from R. Retrieved from http://cran.r-project.org/package=rvertnet
  • Chamberlain, S., Boettiger, C., Ram, K., & Barve, V. (2013). rgbif: Interface to the Global Biodiversity Information Facility API methods. Retrieved from http://cran.r-project.org/package=rgbif
  • Hijmans, R. J., Phillips, S., Leathwick, J., & Elith, J. (2012). dismo: Species distribution modeling. Retrieved from http://cran.r-project.org/package=dismo
  • Otegui, J., & Ariño, A. H. (2012). BIDDSAT: visualizing the content of biodiversity data publishers in the Global Biodiversity Information Facility network. Bioinformatics (Oxford, England), 28(16), 2207–8. doi:10.1093/bioinformatics/bts359
  • Soberón, J., Jiménez, R., Golubov, J., & Koleff, P. (2007). Assessing completeness of biodiversity databases at different spatial scales. Ecography, 30(1), 152–160. doi:10.1111/j.2006.0906-7590.04627.x
  • VanDerWal, J., Falconi, L., Januchowski, S., Shoo, L., & Storlie, C. (2012). SDMTools: Species Distribution Modelling Tools: Tools for processing data associated with species distribution modelling exercises. Retrieved from http://cran.r-project.org/package=SDMTools

To leave a comment for the author, please follow the link and comment on their blog: Vijay Barve.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)