Reference semantics in R

Posted on June 1, 2016 by gluc in R bloggers | 0 Comments

[This article was first published on R – ipub, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Question

I recently got a mail from Václav on reference semantics in data.tree, reading as follows:

Dear Christoph,

I am rather inexperienced when it comes to environments in R and henceforth I apologize if my question is basic; however, my colleagues are no better than me to answer my question.

I would have a question iro the following behavior of your data.tree package. Is it correct that if I create a function which uses some data.tree structure as a parameter, the input value would get changed too?

In the following case I would assume that acme’s values should not get changed.

Thank you, Vaclav

The code he provided was similar to this:

library(data.tree)
data(acme)

acme$val <- 5
acme$val

DoAssign <- function(tr) {
  a <- tr
  a$val <- 33
  return (a)
}

acme2 <- DoAssign(acme)

acme$val
[1] 33

Answer

My answer was as follows:

Well observed, that is indeed the behavior of data.tree. From the manual:

Node and Reference Semantics

The entry point to the package is Node. Each tree is composed of a number of Nodes, referencing each other. One of most important things to note about data.tree is that it exhibits reference semantics. In a nutshell, this means that you can modify your tree along the way, without having to reassign it to a variable after each modification. By and large, this is a rather exceptional behavior in R, where value-semantics is king most of the time.

Reference Semantics Explained

In a nutshell, reference semantics can be understood by the following analogy: If I give you a URL, I provide you with a reference to a web page. You, I and the owner of the web page can access that web page with that URL. And if the owner changes the content, then you will see these changes next time you connect to the URL.

Contrarily, if I print out the web page and give you that print out, then I provide you with a disconnected copy of the web page. You may modify that copy (e.g. by highlighting passages with a marker), but I will not see these changes, nor will you see changes made in the original page by the owner. This is value semantics.

Why data.tree uses reference semantics

The main reason why we chose to do it that way in data.tree is that we treat each Node as a unit. When modifying a Node, or when adding a field to a Node, we do not want to create a deep copy of the entire tree for performance reasons.

Another reason is that it greatly simplifies the API of the package. For example, we can do:

library(data.tree)
data(acme)
#get a list of Nodes
traversal <- Traverse(acme, filterFun = function(node) !is.null(node$cost))
#modify a field
Do(traversal, function(node) node$cost2 <- node$cost * 1.2)
#the value is now modified in the original tree:
print(acme, "cost", "cost2")