Site icon R-bloggers

Open Once, Close Automatically: A Ressource Connection Pattern for R

[This article was first published on R on R & Data Analysis - Eric Stemmler, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Why you should care – at least a little bit

Dangling database connections rarely cause immediate data leaks, but they are a common root cause of outages, degraded performance, and availability incidents. In regulated or public-sector systems, availability failures can have legal and contractual consequences. Managing connections properly is therefore not just a technical concern, but a reliability concern.

https://devclass.com/2020/03/27/github-reveals-database-infrastructure-was-the-villain-behind-february-spate-of-outages-again/

The third incident on February 25, again involved ProxySQL, when “active database connections crossed a critical threshold that changed the behavior of this new infrastructure. Because connections remained above the critical threshold after remediation, the system fell back into a degraded state.”

These kinds of failures may seem operational, but in production environments they can trigger SLA breaches, customer impact, and potential regulatory risk — particularly where availability and resilience are part of compliance requirements (e.g., GDPR’s Article 32 for systems processing personal data). Properly managing connections — opening them only when needed and ensuring deterministic cleanup — is therefore not just “good hygiene” in R programming, but a practical aspect of building robust, reliable systems.

These risks are also relevant in the R ecosystem, where database access is common in long-running processes such as Shiny applications, plumber APIs, and scheduled data pipelines. While R is often associated with academic analysis, it is also widely used in production systems across government and industry. In Shiny apps, for example, each user session may open database connections, and poorly managed connections can quickly accumulate and exhaust backend resources. Because these applications are typically long-lived, small connection leaks can silently degrade performance or cause outages. Proper connection management is therefore not just good practice in R—it is essential for building reliable, scalable, and production-ready applications.

This post introduces a pattern using around on.exit() that makes connections safe, automatic, and worry-free.

R related context

When developing a R-package, one typically aims to divide code into functions, in order to make it easier to maintain it. For instance when developing a Shiny-App, it’s practical to (a) wrap it as a R-package and (b) organize code into different files in the R/ folder and further into several functions.

When working with data from databases, the first thing to do in order to access this data is to open a connection to the respective database.

# example for opening a database connection
db <- DBI::dbConnect(drv = odbc::odbc(), "database") 

For example, let’s say we want to write a function to encapsulate data access like this:

get_data <- function() {
  db <- DBI::dbConnect(drv = odbc::odbc(), "database")  # open ressource connection
  df <- dbGetQuery(conn = db, statement = "SELECT ...") # get data
  DBI::dbDisconnect(db)                                 # close ressource connection
  return(df)                                            # return result
}

The downsides of this approach are:

  • everytime we get data a connection is opened and closed, while perhaps we’d like to keep a connection open for additional subsequent data access

  • data cannot be directly returned as the result from DBI::dbGetQuery() (in R the value of the last statement in a function is what the function returns

  • when writing several different data access functions that follow this pattern, we have to repeat the same lines of code. With many such functions this can be prone to errors and potentially opening the many parallel connections

A straight-forward way to address this would be to take the connection-part out of the function and simply assume the connection-object being present (perhaps putting it somewhere accessible e.g., a special package environemt). Doing so then doesn’t allow us any longer to run tests directly with our function outside the app, so that we again have to explicitly open and close connections several times.

The Pattern

From the R documentation of function on.exit:

on.exit records the expression given as its argument as needing to be executed when the current function exits (either naturally or as the result of an error). This is useful for resetting graphical parameters or performing other cleanup actions.

This is a versatile feature of R-funcitons and we can use it to build a connection function that will ensure the connection is closed again. In essence, we want to build a function that can do something like this:

get_data <- function() {
  connect()
  DBI::dbGetQuery(...)
}

In order for this to work, connect() needs to record the R-expression that will close the opened connection when it’s calling function exits, in R-parlance this is called the parent frame and (unsurprisingly) it can be retrieved via function parent.frame(). Equipped with these two functions we can now write our connect() function:

connect <- function() {
  caller <- parent.frame()
  # If connection already exists in this frame, reuse it
  if (exists(".db", envir = caller, inherits = FALSE)) {
    return(get(".db", envir = caller))
  }
  # Create new connection
  db <- dbConnect(odbc::odbc(), "mot")
  # Store it in the caller's frame
  assign(".db", db, envir = caller)
  # Ensure cleanup when the calling function exits
  do.call(
    what = on.exit,
    args = list(
      quote({
        if (exists(".db", inherits = FALSE)) {
          dbDisconnect(.db)
          rm(.db)
        }
      }),
      add = TRUE
    ),
    envir = caller
  )
  db
}

Note also in the above function definition that in order to be able to reuse an opened connection, that the conncetion Object (db) needs to be stored somewhere. Returning it as a result of the function is no guarantee that it will be assigned to a symbol. That’s problematic because everything that isn’t bound to a symbol in a R session is subject to R’s garbage collection. That’s why, in addition it is also assigned to a variable called .db into the calling frame.

One problem with this implementation is that its vurnable to a name collision: We do not know whether there would already exist a symbol in the calling frame that – perhaps coincidentally – has the same name .db, but represents perhaps something entirely different.

To remedy this, we can explicitly create a closure that is, a function that carries it’s own environment. For more details see Hadley Wickham’s explanation here.

I’m providing here a toy example, where our connection object is represented by a random 3-letter string. Setting this variable to NULL is meant to represent to disconnect from the ressource. You can replace these lines with your code for actually connecting and disconnecting from the ressource.

connect <- local({
  # Private environment, created once
  state <- new.env(parent = emptyenv())
  function() {
    caller <- parent.frame()
    if (identical(caller, .GlobalEnv)) {
      warning("connect() was called from Global Environment, cannot perform automatic disconnect")
    }

    # If connection already exists, reuse it
    if (exists("db", envir = state, inherits = FALSE)) {
      cat("connection already exists", state$db, '\n')
    } else { 
      # Create connection
      state$db <- paste0("db_", paste0(sample(letters, 3), collapse = ''))
      # Ensure cleanup when the *calling* function exits
      cleanup <- function() {
        if (exists("db", envir = state)) {
          # dbDisconnect(state$db)
          cat("disconnecting", state$db, '\n')
          state$db <- NULL
          rm("db", envir = state)
        }
      }
      # do.call(on.exit, list(quote(cleanup()), add = TRUE), envir = caller)
      do.call(on.exit, list(
      substitute(FUN(), list(FUN = cleanup)), add = TRUE), envir = caller)
      cat("new connection", state$db, '\n')
    }
  }
})

foo <- function() {
  cat("inside foo\n")
  connect()
  bar()
}

bar <- function() {
  cat("inside bar\n")
  goo()
}

goo <- function() {
  cat("inside goo\n")
  connect()
}

foo()
## inside foo
## new connection db_kpt 
## inside bar
## inside goo
## connection already exists db_kpt 
## disconnecting db_kpt
goo()
## inside goo
## new connection db_npc 
## disconnecting db_npc

This pattern uses a closure to hold the connection state and on.exit() to register cleanup logic in the caller’s frame. Applying substitute() to the cleanup function is important, because our function definition only lives within the created closure, not in the parent’s frame. What substitute() does is to substitute the symbol (FUN) by our cleanup functions unevaluated parsing tree. This in itself again is a closure (a function plus an attached environment) which retains access to its private state even when evaluated later. The result is a safe, idempotent connection manager that works transparently across nested function calls without leaking state or requiring explicit teardown.

No global state, no name collisions, and no burden on the caller.

Note that, when connect() is run in the Global Environment, the expression provided to on.exit() will evaluated and it’s the responsibility of the caller to close the connection manually.

connect()
new connection db_ykc
Warning:
In connect() : connect() was called from Global Environment, cannot perform automatic disconnect
To leave a comment for the author, please follow the link and comment on their blog: R on R & Data Analysis - Eric Stemmler.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Exit mobile version