Sharing data across shiny modules, an update

[This article was first published on Rtask, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

You can read the original post in its original format on Rtask website by ThinkR here: Sharing data across shiny modules, an update

Some people have recently been vocal about misuses of the "stratégie du petit r", a mechanism for sharing data across {shiny} modules that was detailed both in the Engineering Production-Grade Shiny Apps book and in an older post written in 2019 on this blog.
And yes, if you’re wondering, I did feel old when I realized this blog post is almost 7 years old now 😅

I’m always happy to be proven wrong, to challenge the way I build software, and to become a better software engineer. But given that we weren’t contacted to discuss the ideas behind this strategy, I thought the moment was perfect to give y’all an update on the latest approaches I’ve been using to share data across {shiny} modules, along with some thoughts and comments on the "stratégie du petit r".

Discovering {shiny} modules

I’ve been building {shiny} apps for quite a while now. I’ve probably built more apps than I can remember, and for far longer than I dare to admit. One of the apps I’m most proud of is used by 1,000+ people in a large company, managing millions of euros every year. I’ve been working on this one for five years now.

I can still remember the day Vincent sent us a message on Slack, sharing a video about {shiny} modules and how they would change the way we build apps. I was on a train ride back from one of the many R conferences I’ve attended (and spoken at) over the years. I think this might be in 2017, but I’m not really sure. And I’ll admit it: I had absolutely no idea what this video was about, or how it would fit into my current {shiny} apps.

Being the nerd that I am, I watched this video a couple of times because I wanted to understand it and use it. Any time Vincent shows up and says something will change your coding style, you’d better listen and understand. It took me a bit of time, but after a couple of months, {shiny} modules were a core part of my development workflow, and the add_module() function from {golem} became one of my favorites: it has saved me five minutes of perilous copy-pasting every time I need a new module. That’s a significant amount of lifetime saved thanks to a simple function.

But one of the more complex things with {shiny} modules is this: how do you share global state, data, and reactivity between them? How do I access the CSV read in mod_csv_reader from mod_data_visualisation?

Let’s dive into this question.

What is a {shiny} module

Modules are functions

I feel like {shiny} modules have been mistakenly presented as “reusable pieces of {shiny} code”. Well, they are, but 95% of the modules I’ve written in my career have been used only once. And that’s because most of the time, parts and pieces of an app are too specific to be reused anywhere else.

So {shiny} modules are useful primarily because they address a scoping issue: via two functions, they let you define a small part of your app without having to worry about ID uniqueness across the whole application. Basically, they are building blocks: you start at the top level, then break things down into smaller and smaller pieces.

{shiny} modules being functions means several things:

  • They operate in an ecosystem of environments
  • They are scoped, meaning what happens in them usually stays there unless you actively decide otherwise
  • They can take inputs and generate outputs

Good software engineering practice tells us: a function should take a set of inputs, do just one thing, and produce an output, and we plug these functions into one another like Russian dolls to build a larger workflow. So we might have, for example, one module that contains a tab of the app, which contains two cards, with one card being a module that contains a module with a fileInput to read a CSV.

Let’s take a look at a simple application like this GPX Viewer, with the source code available at https://github.com/ThinkR-open/gpxviewer.

This app follows a pretty common Shiny workflow: take a dataset, plot it, and summarize it. There are multiple ways to split this app into modules.

This can be seen as splitting modules by “doing just one thing” (data configuration / data visualization):

Another could be (upload / configure / plot / summarize):

This too (check for example / upload / configure / download / plot / summarize):

What I’m trying to show here is that “just one thing” can be relative in the context of a big app. Furthermore, and I think this is the biggest point: we need to strike a balance between perfect and practical. For example, I’m currently working for a client where the codebase (R code only) is just under 20,000 lines, and if you start from the top level, the deepest module in the stack is six levels down. Of course, some of the modules could be split into smaller ones, and then into smaller ones, and so on.

But I’m trying to keep things easy to maintain. Given the current size of the codebase, adding more layers or going deeper would make the code far more complex and harder to maintain with no real benefit. So yes, some of these modules are not perfect, and they might not be doing “just one thing”.

You know what they say: “perfect is the enemy of good.”

Can functions take lists as parameters?

This is something I’ve been struggling with for a long time. Can a function take only scalar parameters, or can those parameters be lists too?

I’ve come to terms with this idea for two reasons:

  1. Data frames are lists, and I don’t see any good reason to forbid passing a data.frame as an argument to a function.
  2. JavaScript is full of functions that take scalar values and a list of parameters, and it works well.
    For example, making an HTTP request in JS looks like this:
fetch(
  "/api/users",
  {
    method: "GET",
    headers: {
      "Content-Type": "application/json",
      "Accept": "application/json",
      "Authorization": "Bearer YOUR_TOKEN",
    }
  }
)

Guess what: in {httr} (I know, I’m old-school), you’d do:

GET(
  url = "/api/users",
  config  = add_headers(
    `Content-Type`  = "application/json",
    `Accept`        = "application/json",
    Authorization   = "Bearer YOUR_TOKEN"
  )
)

Yep, config is a list().

If you feel like I’m digressing a bit from my original point, you’re right, a little, but it’s relevant to what I’ll be explaining in the rest of this blog post.

Sharing data across modules

What are we even talking about?

Let’s imagine, for a moment, the following Shiny architecture, which is, to be honest, a very simple one (most of the time, modules won’t be split this evenly).

Modules usually live in two scopes:

  • They do things within themselves
  • They do things that need to be passed to other modules

Doing things within themselves is pretty standard and doesn’t require a lot of thought (as long as you don’t forget the ns() 😅), but sharing things from one module to another in a reactive context can be more challenging. For example, let’s say our app contains the following: mod_3_a has a checkbox, mod_3_b has a data upload module, mod_3_c has a set of configuration options for cleaning the data, mod_3_d has a set of configuration options for the plot, and finally mod_3_g is the module that draws the plot. Once the data is uploaded and cleaned, the code has to be organized in a way that allows two things to happen:

  1. The dataset and configuration are available in mod_3_g
  2. mod_3_g’s context is invalidated and a new plot is drawn (i.e., reactivity is triggered)

If we only had (1), things would be a bit easier, but we also need to make everything reactive.

Let’s now explore the patterns we could use.

Passing reactive objects

One thing I’ve learned over the years is that what works for example apps can be a nightmare in a production context. The official Shiny docs recommend the following pattern: return one or more reactive() objects that can be passed to other modules.

Here, in our context, that would mean the following code (going from the bottom left of the tree to the bottom right):

# Bottom left level
mod_3a_server(){
  return(reactive({ input$abc }))
}
mod_3b_server(){
  return(reactive({ input$def }))
}
mod_3c_server(){
  return(reactive({ input$ghi }))
}
mod_3d_server(){
  return(reactive({ input$jkl }))
}

mod_2_a(){
  mod_3a_reactive <- mod_3a_server()
  mod_3b_reactive <- mod_3b_server()
  return(
    list(
      mod_3a_reactive = mod_3a_reactive,
      mod_3b_reactive  = mod_3b_reactive
    )
  )
}
mod_2_b(){
  mod_3c_reactive <- mod_3c_server()
  mod_3d_reactive <- mod_3d_server()
  return(
    list(
      mod_3c_reactive  = mod_3_c(),
      mod_3d_reactive  = mod_3d()
    )
  )
}

mod_1_a(){
  mod_2_a_results <- mod_2_a()
  mod_2_b_results <- mod_2_b()
  return(
    list(
      mod_3a_reactive = mod_2_a_results$mod_3a_reactive,
      mod_3b_reactive = mod_2_a_results$mod_3b_reactive,
      mod_3c_reactive = mod_2_b_results$mod_3c_reactive,
      mod_3d_reactive = mod_2_b_results$mod_3d_reactive
    )
  )
}

# in server

reactives_from_mod_1_a <- mod_1_a(...)

mod_1_b_server(
  mod_3a_reactive = reactives_from_mod_1_a$mod_3a_reactive,
  mod_3b_reactive = reactives_from_mod_1_a$mod_3b_reactive,
  mod_3c_reactive = reactives_from_mod_1_a$mod_3c_reactive,
  mod_3d_reactive = reactives_from_mod_1_a$mod_3d_reactive
)

# in mod_1_b
mod_1_b_server(){
  mod_2_d_server(
    mod_3a_reactive = mod_3a_reactive,
    mod_3b_reactive = mod_3b_reactive,
    mod_3c_reactive = mod_3c_reactive,
    mod_3d_reactive = mod_3d_reactive
  )
}

mod_2_d_server(){
  mod_3g_server(
    mod_3a_reactive = mod_3a_reactive,
    mod_3b_reactive = mod_3b_reactive,
    mod_3c_reactive = mod_3c_reactive,
    mod_3d_reactive = mod_3d_reactive
  )
}

mod_3g_server <- function(...){
  output$xyz <- renderPlot({
    draw(
      mod_3a_reactive = mod_3a_reactive(),
      mod_3b_reactive = mod_3b_reactive(),
      mod_3c_reactive = mod_3c_reactive(),
      mod_3d_reactive = mod_3d_reactive()
    )
  })
}

If you feel like it’s a mess and complex to reason about, that’s because it is. And we’re in a simple case where data travels at the same depth in the stack.

With this, we’d get:

mod_1_b_server(){
  mod_2_d_server(
    mod_3a_reactive = mod_3a_reactive,
    mod_3b_reactive = mod_3b_reactive,
    mod_3c_reactive = mod_3c_reactive,
    mod_3d_reactive = mod_3d_reactive
  )
}

mod_2_d_server(){

  output$abc <- renderText({
    mod_3d_reactive()
  })

  mod_3g_server(
    mod_3a_reactive = mod_3a_reactive,
    mod_3b_reactive = mod_3b_reactive,
    mod_3c_reactive = mod_3c_reactive
  )
}

mod_3g_server(){
  output$xyz <- renderPlot({
    draw(
      mod_3a_reactive = mod_3a_reactive(),
      mod_3b_reactive = mod_3b_reactive(),
      mod_3c_reactive = mod_3c_reactive()
    )
  })
}

And that’s just to make four values travel through the module graph. In a pretty shallow and evenly organized stack, as I said. And that’s because we’re only passing reactives as parameters.

mod_3g_server <- function(
  dataset,
  mod_3a_reactive,
  mod_3b_reactive,
  mod_3c_reactive,
  with_coordflip = TRUE
){
  output$xyz <- renderPlot({
    draw(
      dataset = dataset,
      mod_3a_reactive = mod_3a_reactive(),
      mod_3b_reactive = mod_3b_reactive(),
      mod_3c_reactive = mod_3c_reactive(),
      with_coordflip = with_coordflip
    )
  })
}

Which is even more complex if you add a layer of reactive() inside your module:

mod_3g_server <- function(
  dataset,
  mod_3a_reactive,
  mod_3b_reactive,
  mod_3c_reactive,
  with_coordflip = TRUE
){
  the_plot_to_draw <- reactive({
    drawing <- draw(
      dataset = dataset,
      mod_3a_reactive = mod_3a_reactive(),
      mod_3b_reactive = mod_3b_reactive(),
      mod_3c_reactive = mod_3c_reactive(),
      with_coordflip = with_coordflip
    )
    return(drawing)
  })
  output$xyz <- renderPlot({
    the_plot_to_draw()
  })
}

Good luck understanding the reactive graph for that one.

As a side note, I think reactive() objects are conceptually neat, but I don’t think they should be your go-to building block.
Let’s have a quick look at:

the_data_frame <- reactive({
  result <- clean_and_transform(
    input$dataset
  )
  return(result)
})

output$table_one <- renderDT({
  the_data_frame()
})

That’s indeed neat: whenever input$dataset changes, something is computed and displayed. It works well for small examples, but as soon as you have to pass it to other functions or modules, it starts to feel harder to reason about, especially if you’re not used to manipulating functions as objects.

I’ve met a lot of R developers who didn’t know you could pass a function as a parameter to another function, and most of the time, with reactive(), people are copying examples from the web without really understanding what’s happening.

But how do they do that in other languages?

I haven’t built real apps in that many languages, but there is one I know (more or less) well: JavaScript.

In the summer of 2024, we spent a couple of weeks working on Rlinguo, a mobile app that can run R code. It’s built in React, and it works just like {shiny} does (well, from a conceptual point of view): you have stateful objects, and when these objects change, they trigger another part of the app to be recomputed. In our case, whenever you interact with the first tab, the second tab (with the visualization) is updated.

In the app, the first layer creates a webR instance, an SQLite connection, and a score object, which is used to trigger a recomputation of the viz. When the app launches, you get a loading screen that waits for webR to be ready. Once it is, webR is queried for functions, and once you’ve validated your answer (in “module” 1), an alert is sent to the viz (in “module 2”) to query the SQLite DB and recompute the graph.

To sum up, some objects are created at the top level and used to share data and trigger reactivity from one “module” to the other.

Note: my colleague Arthur pointed that Vue.js has something called store in Pinia. I’m not exactly sure how it works but apparently it’s more or less the same as reactiveValues. And Claude confirmed it 😄

The “stratégie du petit r”

One strategy we recommended is what we called the “stratégie du petit r”. Looking back, I can admit that it was a poor choice of name, but you know, sh*t happens.

The principle is quite simple: instead of returning and passing reactive() objects as arguments, you create one or more reactiveValues() at an upper level, which you then pass downstream to lower-level modules. reactiveValues() behave a lot like environments, meaning that values set down the stack are available everywhere.

I still think this is a valid way to share data, but only if you avoid applying it too literally and focus on how to work with it in practice.

The main criticism I’ve read about this approach is that you’ll end up with a huge r object with 300 entries in it, creating a monster that’s impossible to debug.

So yes, these monsters exist. But I don’t think the idea itself is the problem. It’s always easier to blame the tool than to acknowledge the lack of understanding behind its misuse. Or, as Beckett wrote, “Voilà l’homme tout entier, s’en prenant à sa chaussure alors que c’est son pied le coupable.” (“There’s man all over for you, blaming on his boots the faults of his feet.”)

Here are some random thoughts:

1. Don’t call it r (probably)

Conventions are great, and they help humans thrive. I think we need them when building software: we spend more time reading than writing code, and conventions help us navigate an unfamiliar codebase. For example, I know that all files starting with mod_ in {golem} contain modules.

When presenting examples for the “stratégie du petit r”, we used r <- reactiveValues(). But that was just for the example. In this post, I’ve used mod_1_a, mod_3_g. Please don’t reuse these names, they’re only examples.

So yes, a small r might be confusing if you don’t work with people who know that convention. If I stumble upon a codebase with an r, I’ll know what it is because I’ve used it before. But nowadays, I tend to go for more expressive naming, usually either global (since it’s global storage), or simply storage. You might prefer other names like global_storage, reactive_storage, or anything that would be clearer to your team.

That being said, everything is a matter of context and convention. For example, dplyr::mutate() has a parameter called .data. You could debate whether it’s a good choice or not, but anytime I see .data, I know it’s a table and that we’re in the tidyverse.

2. You don’t need to share everything between all modules

There is a very, very small chance that you need to share everything across all modules.

Think about the app you’re working on right now. Yes, there are probably a handful of things that need to be available in all modules, but there is no need to store everything in an upper-level reactiveValues().

Your modules need to stay scoped, and, this is probably one of the most important idea to have this implementation work:

  • Things that are only needed inside a module should not be stored in a reactiveValues() defined at an upper level
  • Things that are only needed inside a module should not be passed down to lower-level modules

That’s as simple as that. Think of your app as a tree: values that are only necessary at level N should not “go up” to level N + 1.

3. You can need to have several reactiveValues()

The corollary of the last point is simple: you need several reactiveValues(), operating at different scopes in your application.

Here is a simplified extract of a module from an app I’m currently working on:

mod_abstract_server <- function(
  id,
  global
) {

  local <- reactiveValues()

  observeEvent(
    input$language,{
    local$ai_alert <- build_text_for_ai_alert(
      input$language
    )
  })

  output$alert_ai <- renderUI({
    local$ai_alert
  })

  country_rv <- reactiveValues()

  observe({
    country_rv$country <- input$country
  })

  mod_checklist_server(
      "checklist_1",
      country_rv = country_rv,
      global = global
  )
}

So here, we have:

  • global (which could also be named r_global) which is the reactiveValues() shared across all modules. It contains a dataset that can be updated in an admin panel, but needs to be read in the other modules. It’s passed down from app_server, goes through mod_abstract_server, and down into mod_checklist_server. I can name 10 use cases from client app where this is a valid pattern, just ask me next time you meet me at a conference.
  • local, (which could also be named r_local) which is a reactiveValues() that stores values needed only inside the current module.
  • country_rv, which is defined within the module and passed down to mod_checklist_server.

I could have stored everything in global, and it would still work. But that wouldn’t be good organization or separation of concerns.

To sum up

No structure, no idea, and no framework will ever prevent someone from writing bad code. JavaScript used to be joked about as a language that’s too permissive. Then TypeScript came along and imposed more structure on the language, with a loophole: you can hack the language and use anyas a type for everything, and it will still work. You can write bad code with TypeScript, even if the language is supposed to enforce structure. Nothing can stop you from writing bad code.

Yes, using reactiveValues() as a storage object shared between modules can create monsters if you don’t really think about what you’re doing.

Yes, in an app with a very large number of values floating around, trying to pass data via strict function parameters can create even scarier monsters.

Yes, it’s OK to have a list as a parameter to a module function.

Other patterns

Here are some other patterns that can be used in a {shiny} app to share data across modules.

Storage using an R6 object

One downside I can think of when using the reactiveValues() strategy I just described is that, well, it’s reactive, meaning it can lead to uncontrolled reactivity if things aren’t scoped correctly.

One pattern I’ve used in an app is combining an R6 object, used to store and process data, with the trigger mechanism from {gargoyle}. Basically, the idea behind {gargoyle} is simple: instead of relying on the reactive graph to invalidate itself, you init flags that are triggered in the code, and when a flag is triggered, the context where the flag is watched is invalidated.
It’s a bit longer to implement, but you get better control over what is happening.

Combined with this, you can use an R6 object that is passed along the modules, and that gets transformed to store, process, and serve the data.

You can read more about this in “15.1.3 Building triggers and watchers” and “15.1.4 Using R6 as data storage” in Chapter 15 of the Engineering Shiny book.

session$userData

This one should be used with a lot of caution, but it can be very effective if you know what you’re doing (and if you don’t have too many things to share).

The session object is an environment available everywhere in your Shiny app. It represents the current interaction between each user and the R session (i.e., each user has their own). This environment has a special slot called userData that can be populated with data, and it is scoped to the session.

The way I’ve used it in the past is via wrappers, which would look like:

set_this <- function(value, session = shiny::getDefaultReactiveDomain()){
  session$userData$this <- compute_this(value)
}
get_this <- function(session = shiny::getDefaultReactiveDomain()){
  session$userData$this
}

So anywhere I need it, I’ll use the wrapper function instead of session$userData$this. I would generally use it to define things at the top level that need to be accessible everywhere downstream, but I feel it might be a bit complex to manage if you need to pass data from mod_3_a to mod_3_g.

The documentation says it can be used “to store whatever session-specific data (we) want”, but my gut feeling is that it’s best not to shove too much into it. But I don’t have any rationale reason and I’d be happy to be proven wrong.

An environment in the scope of the package/top level of the app

This is something a lot of R developers do: define an environment inside the package namespace so that, when the package is loaded, you can CRUD into it. For example, there are some (well, several) in {shiny}:

> shiny:::.globals

The function shinyOptions() writes to it, and getShinyOption() reads from it.

This pattern can be used as global storage, but be careful: it’s not session-scoped, so whatever is in this environment is shared across sessions.

An external database or storage system

Another solution is to store values in an external database, and query that DB inside modules.

If you try to implement this solution, two things to keep in mind are:

  • Make the data session-scoped, i.e., use session$token to identify the current session, and remove the data when the session ends.
  • You’ll need to handle reactivity manually, for example with {gargoyle}.

For example, with {storr}:

# Mimicking a session
session <- shiny::MockShinySession$new()

# In module 1
st <- storr::storr_rds(here::here())
st$set("dataset", mtcars, namespace = session$token)

# In module 2
st <- storr::storr_rds(here::here())
st$get("dataset", namespace = session$token)

Of course, this is a short piece of code and you’ll need more engineering, but you get the idea.

Conclusion

It’s been a long post, but I wanted to dive a bit deeper into the why, and to develop the ideas and drawbacks behind the “stratégie du petit r”.
I should have written this post much sooner, but I suppose being attacked publicly on social media without being consulted first is quite the motivator.

Anyway, I’m always happy to chat about the ideas developed here, so feel free to comment or reach out to me (I’m pretty sure that if you need to, it’s very easy to find a way to contact me 😅).

As with anything in life, writing software is always a matter of compromise. Any decision you make while writing code has benefits and drawbacks, and if you can’t find any drawbacks, it’s because you haven’t thought hard enough. When building applications for production, the codebase can become very large. I mentioned an app with 20,000 lines of code, which I recently spent a week refactoring to reduce its size by 20%, but I’m sure other apps I’ve worked on are larger. Still manageable if well organized, but complex anyway.

In the perfect world of software engineering, modules would be so small that they handle just one value, reactive graphs would be fully under control, we’d get code coverage of 100%, all required inputs would be passed as parameters, we would use a typed language that wouldn’t allow unsafe values, and no variable would ever be called x or result.

And then there’s reality.

The client needed this yesterday. Their boss needed it last month. I’m out of coffee. And, to be honest, I’d rather be out in the woods running than debugging renv::install() again.

So we might take shortcuts, use bad variable names, forget to delete a test data.frame from the SQL database, and create reactiveValues() that are monsters.

Still, I genuinely believe nobody is here to sabotage the project.

That we’re all doing the best we can with what we have.

This post is better presented on its original ThinkR website here: Sharing data across shiny modules, an update

To leave a comment for the author, please follow the link and comment on their blog: Rtask.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)