bdvis development version available for early feedback

[This article was first published on Vijay Barve, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Google Summer of Code 2013 is half way through. Mid term evaluations are underway. I thought this is a good logical point for us to share what we have been doing for Biodiversity Data Visualizations in R project and open up the package for testing and some early feedback. We have named the package bdvis. The package is on github, and I would appreciate if you could install and test it. Feedback may be given in the comments here, using issues on github  by twitter or email.

Getting data

The data was obtained from the Data portal of Global Biodiversity Information Facility. (http://data.gbif.org). The data set we are looking for is iNaturalist research grade records. We accessed the datasets page at http://data.gbif.org/datasets/ and selected the iNaturalist.org page from the alphabetic list which is at http://data.gbif.org/datasets/provider/407. Once on this page use link Explore: Occurrences and then from the next page click Download: Spreadsheet of results. On this page make sure  Comma separated values is selected and then press Download Now button. Website may take a few minutes to make your download ready. Once it is ready, the download link will be provided. Typically the name of the file will be occurrence-search-12345.zip The number of digits would be as many as 40.  Use the link to download the .zip file and then extract the data file occurrence-search-12345.csv in the working directory of R. Since this file has a long name, let us rename it to inat.csv for convenience.

Now we are ready to load our data.

inat = read.csv("inat.csv")
dim(inat)

If it shows something like

[1] 66581    47

we are on right track. Our data is loaded into R. For the time being, this package handles only GBIF provided data format, but getting user generated biodiversity data in this format using some built in functions is being worked out.

Package installation

Now let us install bdvis package. First we need to get devtools package which will let us install packages from github (rather than CRAN).

install.packages("devtools")
require(devtools)

install_github("bdvis", "vijaybarve")
require(bdvis)

if this produces something like

Loading required package: bdvis

Attaching package: ‘bdvis’

The following object(s) are masked from ‘package:base’:

summary

we are on right track. Our packages is installed and loaded into R.

Package functions

1. summery

Let us start playing with the functions now. We have the data loaded in inat data frame.

bdvis::summary(inat)

Should produce something like:

Total no of records = 66581
Date range of the records from  1710-02-26  to  2012-12-31
Bounding box of records  -77.89309 , -177.37895  -  78.53431 , 179.2615
Taxonomic summary...
No of Families :  1394
No of Genus :  5089
No of Species :  11299

What does this tell us about our data ?

  • We have 66581 records in the data set
  • The date range is from 1710 to 2012. (Really we have record form 1710? Looks we have a problem there.)
  • The bounding box is almost the whole world. Yes, this is global data set.
  • We have so many Families, Genus and Species represented in this data set.

I have two questions here:

  1. What more would you like to get in the summary?
  2. Should I rename the function summary to something else, so it does not clash with usual data frame summery function name?

2. mapgrid

Now let us generate a Heat map of the records in this data set. This map will show us the density of records in different parts of the world. To generate this map

mapgrid(inat,ptype="species")
mapgrid output for iNaturalist data

mapgrid output for iNaturalist data

ptype could be records if we need the map with raw records rather than aggregated to species. Again the questions:

  • What more options would you like to see here?
  • Ability to zoom in certain region?
  • Control over color pallet ?

3. tempolar

Now coming to Temporal visualizations, the function tempolar would make polar plots of temporal data into daily, weekly and monthly plots. The code and samples are as follows:

tempolar(inat,color="green",title="iNaturalist daily"
          ,plottype="r",timescale="d")
tempolar(inat,color="blue",title="iNaturalist weekly"
          ,plottype="p",timescale="w")
tempolar(inat,color="red",title="iNaturalist monthly"
          ,plottype="r",timescale="m")
Dailyly plot of Temporal data. Each line is records on each day of the year.

Dailyly plot of Temporal data. Each line is records on each day of the year.

Weekly plot of Temporal data. Plottype polygon is used here.

Weekly plot of Temporal data. Plottype polygon is used here.

Monthly plot of Temporal data. Each line is representing records in that month.

Monthly plot of Temporal data. Each line is representing records in that month.

Here options to control color, title, plottype and of course timescale are provided.

We are less than half way through our original proposal, and will continue to actively build this package. As I build more functionality, I will post more information on the blog. Till that time keep the feedback flowing telling us what more you would like to see in this package.


To leave a comment for the author, please follow the link and comment on their blog: Vijay Barve.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)