Future got better at finding global variables
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The future package celebrates ten years on CRAN as of June 19, 2025. This is the first of a series of blog posts highlighting recent improvements to the futureverse ecosystem.
The globals package is part of the futureverse and has had two recent releases on 2025-04-15 and 2025-05-08. These updates address a few corner cases that would otherwise lead to unexpected errors. They also resulted in several long, outstanding issues reported on the future, future.apply, furrr, and doFuture package issue trackers, and elsewhere, could be closed.
The significant update is that findGlobals()
gained argument
method = "dfs"
, which finds globals in R expressions by walking its
abstract syntax tree (AST) using a depth-first-search
algorithm. This new approach does a better job of emulating how the
R engine identifies global variables, which results in an even
smoother ride for anyone using futureverse for parallel and
distributed processing. Previously, a tweaked search algorithm
adopted from codetools::findGlobals()
was used. The codetools
search algorithm is mainly designed for R CMD check
to detect
undefined variables being used in package code. To limit the number of
false positives reported by R CMD check
, such algorithms tend to be
“conservative” by nature, so that we can trust what is reported. This
strategy is not always sufficient for automatically detecting globals
needed in parallel processing. As an example, in
fcn <- function() { a <- b b <- 1 }
variable b
is a global variable, but if we ask codetools, it
does not pick up b
as a global;
codetools::findGlobals(fun) #> [1] "{" "<-"
This false negative is alright for R CMD check
, but, in contrast,
for parallel processing, we need to use a “liberal” search
algorithm. In parallel processing it is okay to pick up and export too
many variables to the parallel worker. If a variable is not used,
little harm is done, but if we fail to export a needed variable, we’ll
end up with an object-not-found error. Futureverse has since the early
days (December 2015) used a modified version of the codetools
algorithm that is liberal, but not too liberal. It detects b
as a
global variable;
globals::findGlobals(fun) #> [1] "{" "<-" "b"
This liberal search strategy turns out to work surprisingly well for detecting globals needed in parallel processing, but there were corner cases where it failed. For example, futureverse struggled to identify global variables in cases such as:
library(future) plan(multisession, workers = 2) x <- 2 f <- future(local({ h <- function(x) -x h(x) })) value(f)
which resulted in
Error in eval(quote({ : object 'x' not found
This is because there are several different variables named x
, and
the one in the calling environment is “masked” by argument x
, which
results in x
never be picked up and exported to the parallel worker.
It might look as if this type of code was carefully curated to fail,
but would rarely, if at all, be spotted in real code. As a matter of
fact, this is a distilled version of a large real-world scenario
reported by at least one person. It’s thanks to such feedback that we
together can make improvements to the futureverse ecosystem 🙏 I
cannot know for sure, but I’d suspect this has impacted several R
developers already - the future package is after all among the
0.6% most downloaded packages and there are 1,300 packages that
“need” it as of
May 2025. The above problem was fixed in globals 0.18.0
(2025-05-08) and future 1.49.0 (2025-05-09), which now make use of
the new findGlobals(..., method = "dfs")
search strategy
internally. After updating these packages, the above code snippet
gives us
value(f) #> [1] -2
as we’d expect.
Another corner-case bug fix, is where
library(future) library(magrittr) x <- list() f <- future ({ x %>% `$<-`("a", 42) })
would result in the rather obscure error
Error in e[[4]] : subscript out of bounds
This is due to a bug in the codetools package, which globals (>= 0.17.0) [2025-04-15] works around. After updating, things work as expected;
f <- future ({ x %>% `$<-`("a", 42) }) value(f) #> $a #> [1] 42
Yet another fix in globals (>= 0.17.0) is that previous versions would throw an error if it ran into an S7 object. The S7 object class was introduced in 2023.
May the future be with you!
Henrik
PS. Did you know that the codetools package is written using literate programming following the vision of Donald Knuth? Neat, eh? And, it’s almost like it was vibe coded, but with the large-language model (LLM) part being replaced by human knowledge and expertise 🤓
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.