# Subtle Variable Scoping in R

**F# and Data Mining**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A languages manual usually defines how a language behaves, but does not warn you in cases where you assume a feature should be supported but isn’t. As an example, I will talk about the subtle variable scoping in R language.

# {} code blocks

A lot of programmers coming from C/C++/Java will assume that code blocks inside {} also introduce a new scope. However, in dynamic languages like R/Python/JavaScript/Matlab, code blocks do not introduce new scopes; only function does. This difference may cause some subtle bugs.

For example, the following R function returns a list of quadratic function objects:

make_funcs <- function(a, b, c_){ n <- length(a) fs <- list() for (i in 1:n) { fs[[i]] <- function(x) { cat(sprintf('eval %.1f*x^2+%.1f*x+%.1f where x = %.1f\n', a[i], b[i], c_[i], x)) a[i]*x*x + b[i]*x + c_[i] } } cat(sprintf('variable i is still in the scope and has value %d\n', i)) fs }

The input to this functions is three vectors of numbers which represent the three coefficients in quadratic forms. And let’s make three objects using the following coefficients:

a <- c(1,2,3) b <- c(4,5,6) c_ <- c(-1,-1,-1)

fs <- make_funcs(a, b, c_) fs[[1]](1) fs[[2]](1) fs[[3]](1)

We are supposed to get three different function values. However, all the three functions are the same after checking the output:

> fs[[1]](1) eval 3.0*x^2+6.0*x+-1.0 where x = 1.0 [1] 8 > fs[[2]](1) eval 3.0*x^2+6.0*x+-1.0 where x = 1.0 [1] 8 > fs[[3]](1) eval 3.0*x^2+6.0*x+-1.0 where x = 1.0 [1] 8

It seems that the three **fs[i]** use the same variable **i** when they are evaluated. That is, when the three functions are created, the R interpreter just remembers **i** as a variable in its parent function. Then the result can be explained: after the loop is finished, the variable **i** has value of 3, and it is still inside the scope of **make_func**.

Let’s see how will we write **make_func** in **F#**:

// a direct translation let make_funcs (a: int array, b:int array, c: int array) = let n = Array.length a let fs = new ResizeArray<(int -> int)>() for i=0 to n-1 do fs.Add(fun x -> a.[i]*x*x + b.[i]*x + c.[i]) fs.ToArray() // a more functional translation let make_funcs2 (a: int array, b:int array, c: int array) = Array.zip3 a b c |> Array.map (fun (a0, b0, c0) -> (fun x -> a0*x*x + b0*x + c0))

The following code would make three different functions as we expect:

let a = [| 1; 2; 3 |] let b = [| 4; 5; 6 |] let c = [| -1; -1; -1 |] let fs = make_funcs (a, b, c) fs.[0](1) // 4 fs.[1](1) // 6 fs.[2](1) // 8

Why F# code works as expected? When the three functions are created, they also know that variable* i* shall be found in the parent scope; however the three **i**s have three independent scopes!

As the behavior in R’s version is definitely not what we want. How to make three different functions? Answer is make three different **i**s in side a new function:

make_funcs2 <- function(a, b, c_){ n <- length(a) fs <- list() for (i in 1:n) { fs[[i]] <- (function() { j <- i function(x) { cat(sprintf('eval %.1f*x^2+%.1f*x+%.1f where x = %.1f\n', a[j], b[j], c_[j], x)) a[j]*x*x + b[j]*x + c_[j] } }) () } fs }

Each time in the for loop, I create a new function and defines a variable **j** inside it, and the new function return the **function(x)**. Notice that this new function is executed for **n **times in the for loop, therefore creates **n** different **j**s.

This trick is ubiquitously used in **JavaScript**. For example instead of writing,

{ var a = 1; //code blocks }

we define a function and execute it immediately to make local variables:

(function() { var a_is_hidden_from_outside = 1; // in other words, no new variable in the global space is introduced. }) ()

# The assignment operators <- and <--

It seems that the block syntax **{}** can be translated as **(function() {}) ()** in R/JavaScript. But in R, things can be more subtle. See the third version of **make_funcs**:

make_funcs3 <- function(a, b, c_){ n <- length(a) fs <- list() for (i in 1:n) (function() { j <- i fs[[i]] <- function(x) { cat(sprintf('eval %.1f*x^2+%.1f*x+%.1f where x = %.1f\n', a[j], b[j], c_[j], x)) a[j]*x*x + b[j]*x + c_[j] } print(length(fs)) }) () fs } a <- c(1,2,3) b <- c(4,5,6) c_ <- c(-1,-1,-1) fs <- make_funcs3(a, b, c_) fs[[1]](1)

We translate **{} **after for loop as **(function () {}) ()**, and now the assignment of **fs[i]** is inside the function wrapper. However, the code would not run correctly:

> fs <- make_funcs(a, b, c_) ls length = 1 ls length = 2 ls length = 3 > > fs[[1]](1) Error in fs[[1]] : subscript out of bounds > length(fs) [1] 0

Obviously the variable **fs** is growing when the for loop is executed, however the variable **fs** inside the for loop is a **different one** **from the one outside the for loop**. And we find that the variable **fs** outside the for loop is only initialized but has not been added any new elements.

The assignment operator **<-** creates a new variables inside a function! It won’t search if the same variable name is in its parent environments! To do that which is what we suppose R to do, we have to use the** <<- operator**:

make_funcs3 <- function(a, b, c_){ n <- length(a) fs <- list() for (i in 1:n) (function() { j <- i fs[[i]] <<- function(x) { cat(sprintf('eval %.1f*x^2+%.1f*x+%.1f where x = %.1f\n', a[j], b[j], c_[j], x)) a[j]*x*x + b[j]*x + c_[j] } cat(sprintf('ls length = %d\n', length(fs))) }) () fs }

And now this function should run as we expected.

# Summary

In using R for interactive data analysis and plotting, most of time we won’t deal with these subtle language features. We just copy/paste some code snippet from R help and online and modify it to suit our own data analysis. However when we are into R programming, these issues do occur and will bite us when we assume our experience in C++/Java would also work in R.

**leave a comment**for the author, please follow the link and comment on their blog:

**F# and Data Mining**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.