[This article was first published on Statistical Modeling, Causal Inference, and Social Science » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Andrew and I have been discussing how we’re going to define functions in Stan for defining systems of differential equations; see our evolving ode design doc; comments welcome, of course.

I mentioned to Andrew I would prefer pure lexical, static scoping, as found in languages like C++ and Java. If you’re not familiar with the alternatives, there’s a nice overview in the Wikipedia article on scope. Let me call out a few passages that will help set the context.

A fundamental distinction in scoping is what “context” means – whether name resolution depends on the location in the source code (lexical scope, static scope, which depends on the lexical context) or depends on the program state when the name is encountered (dynamic scope, which depends on the execution context or calling context). Lexical resolution can be determined at compile time, and is also known as early binding, while dynamic resolution can in general only be determined at run time, and thus is known as late binding.

…scoping rules are crucial in modular programming, so a change in one part of the program does not break an unrelated part.

R, on the Other Hand

R’s scope rules can be quite confusing. First, it lets you define a function with an undefined variable. Start a new R session and do this.

> f <- function(x) { a * x }

> f(3)


But then we can define a and all is well.

> a <- 10
> f(3)
[1] 30


So clearly the value of a is getting set dynamically at run time. But then what if I do this?

> g <- function(y) { a <- 100; f(y); }

> g(3)
[1] 30


Now even if a had not been defined in the global scope, the call to f(y) in the definition of g would not pick up the definition in the body of g.

I expected given the dynamic nature of the definition that the answer would be 300, not 30. It seems that what’s happening is that the location of the variable a is defined when f is first defined, not when f is used.

But the value is whatever is defined in the global environment at the time the function is called. For example, redefining a produces a new value for f(3):

> a <- 20

> f(3)
[1] 60


Stupid R Trick1

Now for the stupid R trick (I didn’t make this one up, but can’t recall or find where I saw it first). Suppose I define a new function h as follows.

> b <- 20
> h <- function(x) { if (rbinom(1,1,0.5)) { b <- 1000 }; return(b * x); }


and then call it a few times

> h(2)
[1] 40

> h(2)
[1] 2000


Whether the value of b is the local variable set to 1000 or the global value set to 20 depends on the outcome of the coin flip determined by calling rbinom(1,1,0.5)!

Presumably this is the behavior intended by the designers of R's scoping mechanism. But I find it very confusing.

If you want to read more about scoping in R and S, John Fox has a document on CRAN,

Closures

If you really want to understand what's going on in R, you'll have to also read up on closures. Then examples like the following will make sense.

> ff <- function(x) { g <- function(y) { return(x * y) }; return(g) }

> ff(7)(9)
[1] 63


What's going on is that R uses the scope of a variable at the point at which the function is defined and that inner function g is not defined until the function ff is called.

The "stupid R trick" is simply based on making the variable's scope non-deterministic.

1 My apologies to David Letterman and his stupid pet tricks; who knew they'd take over the internet?

The post Stupid R Tricks: Random Scope appeared first on Statistical Modeling, Causal Inference, and Social Science.