Caching Encyclopedia of Life API calls

February 12, 2014
By

(This article was first published on rOpenSci Blog - R, and kindly contributed to R-bloggers)

In a recent blog post we discussed caching calls to the web offline, on your own computer. Just like you can cache data on your own computer, a data provider can do the same thing. Most of the data providers we work with do not provide caching. However, at least one does: EOL, or Encyclopedia of Life. EOL allows you to set the amount of time (in seconds) that the call is cached, within which time you can make the same call and get the data back faster. We have a number of functions to interface with EOL in our taxize package.

Install and load taxize and ggplot2.

install.packages(c("taxize", "ggplot2"))
library(taxize)
library(ggplot2)

To easily visualize the benefit of using EOL's caching, let's define a function to:

  • Make a call to the EOL API search service (via the eol_search function in taxize) with caching set to X seconds (which means the cached result will be available for X seconds). This first call caches the query on their servers. Note that in the eol_search function below, we are using the cache_ttl parameter to set the number of seconds to cache the request.
  • The second call is done before X seconds pass, so should be faster as the first one was cached.
  • Sleep for a period, a bit longer than the amount of time the call is cached.
  • The third call occurs after the cached call should be gone on the EOL servers.
  • Plot the times each request took.
testcache <- function(terms, cache){
  first <- system.time( eol_search(terms=terms, cache_ttl = cache) )
  second <- system.time( eol_search(terms=terms, cache_ttl = cache) )
  Sys.sleep(cache+2)
  third <- system.time( eol_search(terms=terms, cache_ttl = cache) )

  df <- data.frame(labs=c('nocache','withcache','cachetimedout'), 
                   vals=c(first[[3]], second[[3]], third[[3]]))
  df$labs <- factor(df$labs, levels = c('nocache','withcache','cachetimedout'))
  ggplot(df, aes(labs, vals)) + 
    geom_bar(stat='identity') + 
    theme_grey(base_size = 20) +
    ggtitle(sprintf("search term: '%s'\n", terms)) +
    labs(y='Time to get data\n', x='')
}

Search for the term lion

testcache(terms = "lion", cache = 5)

Search for the term beetle

testcache(terms = "beetle", cache = 10)

Caching works the same way with the eol_pages function. No other API services and associated functions in taxize support caching on the server side by the data provider. Of course you can do your own caching using knitr or other methods - some of which we discussed in an earlier post.

To leave a comment for the author, please follow the link and comment on his blog: rOpenSci Blog - R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.