Future got better at finding global variables

[This article was first published on JottR on R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The 'future' hexlogo balloon wall

The future package celebrates ten years on CRAN as of June 19, 2025. This is the first of a series of blog posts highlighting recent improvements to the futureverse ecosystem.

The globals package is part of the futureverse and has had two recent releases on 2025-04-15 and 2025-05-08. These updates address a few corner cases that would otherwise lead to unexpected errors. They also resulted in several long, outstanding issues reported on the future, future.apply, furrr, and doFuture package issue trackers, and elsewhere, could be closed.

The significant update is that findGlobals() gained argument method = "dfs", which finds globals in R expressions by walking its abstract syntax tree (AST) using a depth-first-search algorithm. This new approach does a better job of emulating how the R engine identifies global variables, which results in an even smoother ride for anyone using futureverse for parallel and distributed processing. Previously, a tweaked search algorithm adopted from codetools::findGlobals() was used. The codetools search algorithm is mainly designed for R CMD check to detect undefined variables being used in package code. To limit the number of false positives reported by R CMD check, such algorithms tend to be “conservative” by nature, so that we can trust what is reported. This strategy is not always sufficient for automatically detecting globals needed in parallel processing. As an example, in

fcn <- function() { 
  a <- b
  b <- 1 
}

variable b is a global variable, but if we ask codetools, it does not pick up b as a global;

codetools::findGlobals(fun)
#> [1] "{"  "<-"

This false negative is alright for R CMD check, but, in contrast, for parallel processing, we need to use a “liberal” search algorithm. In parallel processing it is okay to pick up and export too many variables to the parallel worker. If a variable is not used, little harm is done, but if we fail to export a needed variable, we’ll end up with an object-not-found error. Futureverse has since the early days (December 2015) used a modified version of the codetools algorithm that is liberal, but not too liberal. It detects b as a global variable;

globals::findGlobals(fun)
#> [1] "{"  "<-" "b"

This liberal search strategy turns out to work surprisingly well for detecting globals needed in parallel processing, but there were corner cases where it failed. For example, futureverse struggled to identify global variables in cases such as:

library(future)
plan(multisession, workers = 2)

x <- 2

f <- future(local({
  h <- function(x) -x
  h(x)
}))
value(f)

which resulted in

Error in eval(quote({ : object 'x' not found

This is because there are several different variables named x, and the one in the calling environment is “masked” by argument x, which results in x never be picked up and exported to the parallel worker.

It might look as if this type of code was carefully curated to fail, but would rarely, if at all, be spotted in real code. As a matter of fact, this is a distilled version of a large real-world scenario reported by at least one person. It’s thanks to such feedback that we together can make improvements to the futureverse ecosystem 🙏 I cannot know for sure, but I’d suspect this has impacted several R developers already - the future package is after all among the 0.6% most downloaded packages and there are 1,300 packages that “need” it as of May 2025. The above problem was fixed in globals 0.18.0 (2025-05-08) and future 1.49.0 (2025-05-09), which now make use of the new findGlobals(..., method = "dfs") search strategy internally. After updating these packages, the above code snippet gives us

value(f)
#> [1] -2

as we’d expect.

Another corner-case bug fix, is where

library(future)
library(magrittr)
x <- list()
f <- future ({ x %>% `$<-`("a", 42) })

would result in the rather obscure error

Error in e[[4]] : subscript out of bounds

This is due to a bug in the codetools package, which globals (>= 0.17.0) [2025-04-15] works around. After updating, things work as expected;

f <- future ({ x %>% `$<-`("a", 42) })
value(f)
#> $a
#> [1] 42

Yet another fix in globals (>= 0.17.0) is that previous versions would throw an error if it ran into an S7 object. The S7 object class was introduced in 2023.

May the future be with you!

Henrik

PS. Did you know that the codetools package is written using literate programming following the vision of Donald Knuth? Neat, eh? And, it’s almost like it was vibe coded, but with the large-language model (LLM) part being replaced by human knowledge and expertise 🤓

To leave a comment for the author, please follow the link and comment on their blog: JottR on R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)