I am applying for Google Summer of Code 2014 again with “Biodiversity Data Visualizations using R” proposal. We are proposing to take package bdvis to next level by adding more functions and making it available through CRAN. I am posting this idea to get feedback and suggestions from Biodiversity Informatics community.
[During next few days I will keep updating this to accommodate suggestions. The example visualizations here are crude examples of the ideas, and need lot of work to convert them into reusable functions.]
Package bdvis is already under development and was successful projects in GSoC 2013. As of now the package has basic functionality to perform biodiversity data visualizations, but with growing user base for the package, requests for additional features are coming up. We propose to add the user requested functionality and implement some new functions to take bdvis to next level. Following are the major tasks of proposed project.
- Fix currently reported bugs and complete documentation to submit package to CRAN.
- Implementation of additional features requested by users.
- Develop seamless data support.
- Additional functions for visualizations.
- Prepare detailed vignette.
User requested features
The features and functionality requested by users so far are the following:
- A versatile function to subset the data based on taxonomy for a species, genus, family etc. or date like a particular year or range of years and so on.
- Tempolar ability to show average records per day/week/month rather than just raw numbers currently
- Taxotree additional parameters to control the diagram like Title, Legend, Colors. Also to add ability to choose summary based on number of records, number of species or higher taxonomy
- bdsummary number of grid cells covered by data records and % of coverage of the bounding box
- Visualisation ability for the output of completeness analysis bdcomplete function
- Improve gettaxo efficiency by adding ability to search by genus rather than current scientific name. This could be added as an option in case user needs to search by full scientific names for some reason.
Data formats support
Develop functions for seamless support for major available Biodiversity occurrence data formats in R environment to work with bdvis package. Preliminary list of packages that make data available are rgbif, rvertnet, rinat, spocc. Get feedback from user community for additional data sources they might be using and incorporate them into the worklist.
- Distribution of collection efforts over time (line graph) [Fig 1 Soberon et al 2000]
- Distribution of number of records among taxon, cells (histogram) [Fig 3,4 Soberon et al 2000]
- Distribution of number of species among cells (histogram) [Fig 5 Soberon et al 2000]
- Completeness vs number of species(scatterplot) [Fig 6 Soberon et al 2000]
- Record densities for day of year and week of year [Otegui 2012]
- Records per year dot plots [Otegui 2012]
- calenderHeat maps of number of records or species recorded
Prepare test data sets for the vignette. Three data sets one with global geographical coverage and wide species coverage, second with country level geographical and Class or Order level species coverage and final narrow species selection may be at genus level to demonstrate functionality. Write up code and explanation of each of the function in package, add result tables, graphs and maps to complete the vignette.
- Otegui, J., & Ariño, A. H. (2012). BIDDSAT: visualizing the content of biodiversity data publishers in the Global Biodiversity Information Facility network. Bioinformatics (Oxford, England), 28(16), 2207–8. doi:10.1093/bioinformatics/bts359
- Soberón, J., Llorente, J., & Oñate, L. (2000). The use of specimen-label databases for conservation purposes: an example using Mexican Papilionid and Pierid butterflies. Biodiversity and Conservation, 9(Roman 1997), 1441–1466. Retrieved from http://www.springerlink.com/index/H58022627013233W.pdf