Caching Encyclopedia of Life API calls

[This article was first published on rOpenSci Blog - R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In a recent blog post we discussed caching calls to the web offline, on your own computer. Just like you can cache data on your own computer, a data provider can do the same thing. Most of the data providers we work with do not provide caching. However, at least one does: EOL, or Encyclopedia of Life. EOL allows you to set the amount of time (in seconds) that the call is cached, within which time you can make the same call and get the data back faster. We have a number of functions to interface with EOL in our taxize package.

Install and load taxize and ggplot2.

install.packages(c("taxize", "ggplot2"))

library(taxize)
library(ggplot2)

To easily visualize the benefit of using EOL's caching, let's define a function to:

  • Make a call to the EOL API search service (via the eol_search function in taxize) with caching set to X seconds (which means the cached result will be available for X seconds). This first call caches the query on their servers. Note that in the eol_search function below, we are using the cache_ttl parameter to set the number of seconds to cache the request.
  • The second call is done before X seconds pass, so should be faster as the first one was cached.
  • Sleep for a period, a bit longer than the amount of time the call is cached.
  • The third call occurs after the cached call should be gone on the EOL servers.
  • Plot the times each request took.
testcache <- function(terms, cache){
  first <- system.time( eol_search(terms=terms, cache_ttl = cache) )
  second <- system.time( eol_search(terms=terms, cache_ttl = cache) )
  Sys.sleep(cache+2)
  third <- system.time( eol_search(terms=terms, cache_ttl = cache) )

  df <- data.frame(labs=c('nocache','withcache','cachetimedout'), 
                   vals=c(first[[3]], second[[3]], third[[3]]))
  df$labs <- factor(df$labs, levels = c('nocache','withcache','cachetimedout'))
  ggplot(df, aes(labs, vals)) + 
    geom_bar(stat='identity') + 
    theme_grey(base_size = 20) +
    ggtitle(sprintf("search term: '%s'\n", terms)) +
    labs(y='Time to get data\n', x='')
}

Search for the term lion

testcache(terms = "lion", cache = 5)

Search for the term beetle

testcache(terms = "beetle", cache = 10)

Caching works the same way with the eol_pages function. No other API services and associated functions in taxize support caching on the server side by the data provider. Of course you can do your own caching using knitr or other methods - some of which we discussed in an earlier post.

To leave a comment for the author, please follow the link and comment on their blog: rOpenSci Blog - R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)