How to reliably access network resources in R

January 21, 2015
By

(This article was first published on Cartesian Faith » R, and kindly contributed to R-bloggers)

It’s frustrating when an application unexpectedly dies due to a network timeout or unavailability of a network resource. Veterans of distributed systems know not to rely on network-based resources, such as web services or databases, since they can be unpredictable. So what is a data scientist supposed to do when you must use these resources in her analysis/application?

When there is a true network partition, there’s not much you can do since these resources are inaccessible. Most of the time, though, the issue is a timeout due to network latency or an unresponsive server. In these situations, the problem is temporary. It would be nice to recover from the error without having to add a bunch of logic and muddying up your model code. Recovery can be as simple as trying again, eventually failing if a resource is truly unavailable.

The new function ntry in lambda.tools 1.0.5 does just this: call a function up to n times, returning the result of the first successful call.

Here’s an example of how it works. The following function simulates an unreliable resource that fails 75% of the time. Using ntry, the function will be tried over and over until it either succeeds or the limit is reached.

library(lambda.tools)
library(futile.logger)

fn <- function(i) {
  x <- sample(1:4, 1)
  flog.info("x = %s",x)
  if (x < 4) stop('stop') else x
}

Calling the function in isolation will mostly likely fail:

> fn()
INFO [2015-01-21 18:26:21] x = 2
Error in fn() : stop

This is similar to what happens with a timeout, where sometimes a function will fail. To get around this, normally a loop of some sort is introduced to try a few times until the call succeeds. With ntry it’s simply a matter of wrapping a function in a closure and specifying the number of tries.

> ntry(fn, 6)
INFO [2015-01-21 18:39:21] x = 2
INFO [2015-01-21 18:39:21] x = 4
[1] 4

Here’s a real-world example using RPostgreSQL. In a single function, a connection is opened, the query executed, and the connection closed.

db_execute_query <- function(query) {
  on.exit(dbDisconnect(con))
  drv <- dbDriver("PostgreSQL")
  con <- dbConnect(drv, host=HOST, port=PORT, dbname=DATABASE,
    user=USER, password=PASS)

  dbGetQuery(con, statement=query)
}

For this to work with ntry, I use the on.exit function to disconnect. Normally I’d use a tryCatch block, but since ntry will catch the error, I leave this code naked. The ntry wraps the DB call in a closure, where the argument i is the attempt number. This is useful if you want to debug the call. The second parameter is simply the number of tries.

df <- ntry(function(i) db_execute_query(query), 3)

Access to the database is now a bit more resilient. To try it out yourself, install the latest version of lambda.tools via devtools.

library(devtools)
install_github('lambda.tools','zatonovo')

To leave a comment for the author, please follow the link and comment on their blog: Cartesian Faith » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)