**F# and Data Mining**, and kindly contributed to R-bloggers)

A languages manual usually defines how a language behaves, but does not warn you in cases where you assume a feature should be supported but isn’t. As an example, I will talk about the subtle variable scoping in R language.

# {} code blocks

A lot of programmers coming from C/C++/Java will assume that code blocks inside {} also introduce a new scope. However, in dynamic languages like R/Python/JavaScript/Matlab, code blocks do not introduce new scopes; only function does. This difference may cause some subtle bugs.

For example, the following R function returns a list of quadratic function objects:

make_funcs <- function(a, b, c_){

n <- length(a)

fs <- list()

for (i in 1:n) {

fs[[i]] <- function(x) {

cat(sprintf('eval %.1f*x^2+%.1f*x+%.1f where x = %.1f\n', a[i], b[i], c_[i], x))

a[i]*x*x + b[i]*x + c_[i]

}

}

cat(sprintf('variable i is still in the scope and has value %d\n', i))

fs

}

The input to this functions is three vectors of numbers which represent the three coefficients in quadratic forms. And let’s make three objects using the following coefficients:

a <- c(1,2,3)

b <- c(4,5,6)

c_ <- c(-1,-1,-1)

fs <- make_funcs(a, b, c_)

fs[[1]](1)

fs[[2]](1)

fs[[3]](1)

We are supposed to get three different function values. However, all the three functions are the same after checking the output:

> fs[[1]](1)

eval 3.0*x^2+6.0*x+-1.0 where x = 1.0

[1] 8

> fs[[2]](1)

eval 3.0*x^2+6.0*x+-1.0 where x = 1.0

[1] 8

> fs[[3]](1)

eval 3.0*x^2+6.0*x+-1.0 where x = 1.0

[1] 8

It seems that the three **fs[i]** use the same variable **i** when they are evaluated. That is, when the three functions are created, the R interpreter just remembers **i** as a variable in its parent function. Then the result can be explained: after the loop is finished, the variable **i** has value of 3, and it is still inside the scope of **make_func**.

Let’s see how will we write **make_func** in **F#**:

// a direct translation

let make_funcs (a: int array, b:int array, c: int array) =

let n = Array.length a

let fs = new ResizeArray<(int -> int)>()

for i=0 to n-1 do

fs.Add(fun x -> a.[i]*x*x + b.[i]*x + c.[i])

fs.ToArray()

// a more functional translation

let make_funcs2 (a: int array, b:int array, c: int array) =

Array.zip3 a b c

|> Array.map (fun (a0, b0, c0) ->

(fun x -> a0*x*x + b0*x + c0))

The following code would make three different functions as we expect:

let a = [| 1; 2; 3 |]

let b = [| 4; 5; 6 |]

let c = [| -1; -1; -1 |]

let fs = make_funcs (a, b, c)

fs.[0](1) // 4

fs.[1](1) // 6

fs.[2](1) // 8

Why F# code works as expected? When the three functions are created, they also know that variable* i* shall be found in the parent scope; however the three **i**s have three independent scopes!

As the behavior in R’s version is definitely not what we want. How to make three different functions? Answer is make three different **i**s in side a new function:

make_funcs2 <- function(a, b, c_){

n <- length(a)

fs <- list()

for (i in 1:n) {

fs[[i]] <-

(function(){

j <- i

function(x) {

cat(sprintf('eval %.1f*x^2+%.1f*x+%.1f where x = %.1f\n', a[j], b[j], c_[j], x))

a[j]*x*x + b[j]*x + c_[j]

}

}) ()

}

fs

}

Each time in the for loop, I create a new function and defines a variable **j** inside it, and the new function return the **function(x)**. Notice that this new function is executed for **n **times in the for loop, therefore creates **n** different **j**s.

This trick is ubiquitously used in **JavaScript**. For example instead of writing,

{

var a = 1;

//code blocks}

we define a function and execute it immediately to make local variables:

(function() {

var a_is_hidden_from_outside = 1;

// in other words, no new variable in the global space is introduced.}) ()

# The assignment operators <- and <–

It seems that the block syntax **{}** can be translated as **(function() {}) ()** in R/JavaScript. But in R, things can be more subtle. See the third version of **make_funcs**:

make_funcs3 <- function(a, b, c_){

n <- length(a)

fs <- list()

for (i in 1:n)(function() {j <- i

fs[[i]] <-

function(x) {

cat(sprintf('eval %.1f*x^2+%.1f*x+%.1f where x = %.1f\n', a[j], b[j], c_[j], x))

a[j]*x*x + b[j]*x + c_[j]

}

print(length(fs))}) ()

fs

}

a <- c(1,2,3)

b <- c(4,5,6)

c_ <- c(-1,-1,-1)

fs <- make_funcs3(a, b, c_)

fs[[1]](1)

We translate **{} **after for loop as **(function () {}) ()**, and now the assignment of **fs[i]** is inside the function wrapper. However, the code would not run correctly:

> fs <- make_funcs(a, b, c_)ls length = 1

ls length = 2

ls length = 3

>

> fs[[1]](1)Error in fs[[1]] : subscript out of bounds

> length(fs)

[1] 0

Obviously the variable **fs** is growing when the for loop is executed, however the variable **fs** inside the for loop is a **different one** **from the one outside the for loop**. And we find that the variable **fs** outside the for loop is only initialized but has not been added any new elements.

The assignment operator **<-** creates a new variables inside a function! It won’t search if the same variable name is in its parent environments! To do that which is what we suppose R to do, we have to use the** <<- operator**:

make_funcs3 <- function(a, b, c_){

n <- length(a)

fs <- list()

for (i in 1:n)

(function() {

j <- i

fs[[i]] <<-

function(x) {

cat(sprintf('eval %.1f*x^2+%.1f*x+%.1f where x = %.1f\n', a[j], b[j], c_[j], x))

a[j]*x*x + b[j]*x + c_[j]

}

cat(sprintf('ls length = %d\n', length(fs)))

}) ()

fs

}

And now this function should run as we expected.

# Summary

In using R for interactive data analysis and plotting, most of time we won’t deal with these subtle language features. We just copy/paste some code snippet from R help and online and modify it to suit our own data analysis. However when we are into R programming, these issues do occur and will bite us when we assume our experience in C++/Java would also work in R.

**leave a comment**for the author, please follow the link and comment on their blog:

**F# and Data Mining**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...