Maximizing Efficiency: A Guide to Benchmarking Memory Usage in Shiny Apps

[This article was first published on Tag: r - Appsilon | Enterprise R Shiny Dashboards, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R/Shiny allows you to prototype a working web application quickly and easily. However, with increasing amounts of data, your app may become slow and, in extreme cases, crash due to insufficient memory.

When the worst-case scenario happens, we need to figure out a way to lower the memory usage of our app to avoid those crashes.

A crucial part of optimization efforts is benchmarking how much memory our app is consuming. This allows us to check if the changes we made to the app are indeed moving us in the right direction.

In this step-by-step guide, we will describe how to do that based on an example application.

Table of Contents

How to Measure Memory Usage of Shiny

You might already be familiar with the {profmem} package for profiling memory usage of R expressions. {profmem} uses Rprofmem under the hood and in the docs, we can find that with utils::Rprofmem() it is not possible to quantify the total memory usage at a given time because it only logs allocations and does, therefore, not reflect deallocations done by the garbage collector.

Additionally, Rprofmem does not track allocations made by non-R native libraries or packages that use native calloc() or free() for internal objects.

In the context of Shiny, we are usually interested in how much memory the R process running our app is using. That information allows us to estimate what infrastructure we will need to provision in order to host our app and get an overall feel of how our app scales memory-wise (e.g. does memory usage increase drastically with more users?).

To achieve that, we will use the {bench} package, which provides the bench_process_memory function. That function uses operating system APIs to determine how much memory is used by the current R process, including all the memory from child processes and memory allocated outside R’s garbage collector heap.

bench::bench_process_memory informs us not only about the currently used amount of memory but also about the peak memory usage that occurred during the process lifecycle.

📝 Note: There are also other packages that can be used for measuring process memory usage, like memuse. However, as of today, it does not support measuring peak memory usage on MacOS – we submitted a Pull Request adding support for that. But later on, we learned that {bench} already supports that. Hence, we recommend using {bench}.

Throughout our example, we will use the following helper function:

wait_for_app_to_start <- function(url) { httr2::request(url) |> 
    httr2::req_retry(
      max_seconds = 5,
      backoff = function(attempt) 2 ** attempt
    )
}

measure_mem_usage <- function() {
  result_file <- tempfile(fileext = "RDS")
  port <- httpuv::randomPort()
  app_process <- callr::r_bg(
    function(result_file, port) {
      on.exit({ 
        saveRDS(bench::bench_process_memory(), result_file) 
      }) 
      
      shiny::runApp(port = port)
    }, args = list(result_file = result_file, port = port))
  
  on.exit({ 
    if (app_process$is_alive()) {
      app_process$kill() 
    }
  })
  
  app_url <- paste0("http://127.0.0.1:", port)
  
  wait_for_app_to_start(app_url)
  
  utils::browseURL(app_url)
  
  cat ("Press [enter] to finish the test...")
  line <- readline()
  
  app_process$interrupt()
  
  app_process$wait()
  
  readRDS(result_file)
}

Let’s break down one by one what is happening in this function:

  1. We start a shiny app in a separate R process – this is important as we don’t want the work we did previously in our R session to impact the results (e.g. we might have analyzed a large dataset which could be the source of peak memory usage)
  2. We register a callback on function exit that will save the memory measurements in a temporary file
  3. After the background R process with our app is started, our function opens the app in our browser and waits for user input. This gives us time to simulate user interactions with our app.
  4. Once we are done clicking through our app, we can hit enter in our R console, and the background process will be interrupted. Once the background process terminates, we read memory measurements from the temporary file.

Let’s see that in action:

Discover more insights on boosting your app’s speed and efficiency in our detailed piece: shiny.benchmark – How to Measure Performance Improvements in R Shiny Apps.

Example App

All right, now let’s use our memory benchmarking function on an actual app. Let’s assume we are working with credit card data; we will generate a fake dataset using {charlatan} and save it in an SQLite database:

library(charlatan)
library(DBI)
library(dplyr)

set.seed(123)

# Generate Fake Data
TABLE_ROW_COUNT <- 1e7

fake_providers <- ch_credit_card_provider(100)
fake_data <- data.frame(
  provider = sample(fake_providers, size = TABLE_ROW_COUNT, replace = TRUE)
)

# Save data to sqlite database
conn <- dbConnect(drv = RSQLite::SQLite(), "database.sqlite")

dbWriteTable(
  conn = conn,
  name = "credit_cards",
  value = fake_data,
  overwrite = TRUE
)

Now, let’s create a Shiny App that will display the top 10 most popular card providers:

library(DBI)
library(dplyr)
library(reactable)
library(shiny)

conn <- dbConnect(drv = RSQLite::SQLite(), "database.sqlite")

shiny::onStop(function() {
  dbDisconnect(conn)
})

ui <- fluidPage(
  titlePanel("Credit Cards App"),
  reactableOutput("top_credit_providers")
)

server <- function(input, output, session) {
  credit_cards <- dbGetQuery(
    conn = conn,
    "SELECT * FROM credit_cards"
  )

  output$top_credit_providers <- renderReactable({
    top_providers <- credit_cards |> 
      group_by(provider) |> 
      summarise(popularity = n()) |> 
      arrange(desc(popularity)) |>
      head(10) |> 
      collect()
    
    reactable(top_providers)
  })
  
}

shinyApp(ui, server)

Let’s see how much memory the app is using using our helper function:

> measure_mem_usage()
Press [enter] to finish the test...

current     max 
  481MB   481MB 

Ok, now let’s see how that changes if we simulate multiple sessions within the app – this can be done by opening multiple tabs with our app. Here are the results for 2, 3, 4 and 5 sessions:

> measure_mem_usage() # 2 sessions
Press [enter] to finish the test...

current     max 
  606MB   606MB 


> measure_mem_usage() # 3 sessions
Press [enter] to finish the test...

current     max 
  678MB   678MB 


> measure_mem_usage() # 4 sessions
Press [enter] to finish the test...

current     max 
  769MB   769MB 


> measure_mem_usage() # 5 sessions
Press [enter] to finish the test...

current     max 
  844MB   844MB

Memory Usage Across Different Numbers of Sessions 1

Based on the above measurements, we can see that for each session we are allocating extra 72MB – 100MB of memory.

Let’s try to make our app more efficient, some of you probably noticed that we are fetching the data separately for each session which means we store the same data multiple times in our app.

We can make that more efficient by fetching the data in the global scope.

library(DBI)
library(dplyr)
library(reactable)
library(shiny)

conn <- dbConnect(drv = RSQLite::SQLite(), "database.sqlite")

credit_cards <- dbGetQuery(
  conn = conn,
  "SELECT * FROM credit_cards"
)

shiny::onStop(function() {
  dbDisconnect(conn)
})

ui <- fluidPage(
  titlePanel("Credit Cards App"),
  reactableOutput("top_credit_providers")
)

server <- function(input, output, session) {
  
  output$top_credit_providers <- renderReactable({
    top_providers <- credit_cards |> 
      group_by(provider) |> 
      summarise(popularity = n()) |> 
      arrange(desc(popularity)) |>
      head(10) |> 
      collect()
    
    reactable(top_providers)
  })
  
}

shinyApp(ui, server)

Let’s measure if that made our app more memory efficient:

> measure_mem_usage() # 1 session
Press [enter] to finish the test...

current     max 
  474MB   474MB 


> measure_mem_usage() # 2 sessions
Press [enter] to finish the test...

current     max 
  497MB   497MB 


> measure_mem_usage() # 3 sessions
Press [enter] to finish the test...

current     max 
  503MB   503MB  


> measure_mem_usage() # 4 sessions
Press [enter] to finish the test...

current     max 
  530MB   530MB


> measure_mem_usage() # 5 sessions
Press [enter] to finish the test...

current     max 
  546MB   546MB 

Memory Usage Across Different Numbers of Sessions 2

As we can see now our app allocates an extra 6 – 27MB per session this is an almost 4x improvement!

Let’s try to make it even better! Currently we are fetching the whole credit card data into the R process memory, but we only display the top 10 values! What a waste of memory!

Let’s fix that by extracting computations into the database – this is very thanks to {dbplyr} as we can reuse the same {dplyr} functions.

library(DBI)
library(dplyr)
library(reactable)
library(shiny)

conn <- dbConnect(drv = RSQLite::SQLite(), "database.sqlite")

credit_cards <- tbl(conn,"credit_cards")

shiny::onStop(function() {
  dbDisconnect(conn)
})

ui <- fluidPage(
  titlePanel("Credit Cards App"),
  reactableOutput("top_credit_providers")
)

server <- function(input, output, session) {
  
  output$top_credit_providers <- renderReactable({
    top_providers <- credit_cards |>
      group_by(provider) |> 
      summarise(popularity = n()) |> 
      arrange(desc(popularity)) |>
      head(10) |> 
      collect()
    
    reactable(top_providers)
  })
  
}

shinyApp(ui, server)

Let’s repeat our benchmarks again:

> measure_mem_usage() # 1 session
Press [enter] to finish the test...

current     max 
  229MB   229MB 


> measure_mem_usage() # 2 sessions
Press [enter] to finish the test...

current     max 
  225MB   225MB 


> measure_mem_usage() # 2 sessions
Press [enter] to finish the test...

current     max 
  231MB   231MB


> measure_mem_usage() # 3 sessions
Press [enter] to finish the test...

current     max 
  232MB   232MB 


> measure_mem_usage() # 4 sessions
Press [enter] to finish the test...

current     max 
  233MB   233MB 


> measure_mem_usage() # 5 sessions
Press [enter] to finish the test...

current     max 
  233MB   233MB 

Memory Usage Across Different Numbers of Sessions 3

Now the memory usage of our app seems to be barely increasing; there is only a 4MB difference between the app used by 1 user and the app used by 5 users.

Not to mention that compared to the apps that were fetching whole datasets into memory, we are saving 245MB of memory!

Limitations

The described method of measuring memory usage of a memory app has its limitations. For example, if our app is using {promises}, depending on the type of future backend we are using our measurements might be less accurate.

If our backend uses child processes, bench::bench_process_memory will include them in the measurements. For example, when using future::multicore, futures are run in child processes of the main R process.

However, if we are using future::multisession, futures are run in separate processes (not child processes), and in that case, memory used by those processes won’t be included in the measurements.

Conclusion

In this blog post, we described how to benchmark memory usage of the Shiny app using the {bench} package.

Additionally, we showed that by extracting computations into a database, we can make an almost 4x improvement in terms of memory usage.

This improves the scalability of our application and might allow us to cut down on infrastructure costs, as machines with less memory can be used to handle the same traffic.

If you found this article helpful, don’t miss out on the latest trends and advancements in R/Shiny — subscribe to Shiny Weekly for regular updates and exclusive content.

The post appeared first on appsilon.com/blog/.

To leave a comment for the author, please follow the link and comment on their blog: Tag: r - Appsilon | Enterprise R Shiny Dashboards.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)