For more than ten years, I have been teaching R both formally and informally. One thing that I find often trips up students is the use of R’s accessors and mutators. ( For those readers not from a formal computer science background, an accessor is a method for accessing data in an object usually an attribute of that object.) A simple example is taking a subset of a vector:
 "a" "b" "c"
As you can see, the result is a character vector containing the first three letters of letters vector.
Good programming languages have a standard pattern for accessor and mutators. For R, there are three: [, [[, and $. This confuses beginners coming from other programming languages. Java and Python have one: '.'. Why does R need three?
The reason derives from R's data centric view of the world. R natively provides vectors, lists, data frames, matrices, etc. In truth, one can get by using only [ to extract information from these structures, but the others are handy in certain scenarios. So much so that after a while, they feel indispensible. I will explain each and hopefully by the end of this article you will understand why each exists, what to remember and, more importantly, when to each should be used.
Subset with [
When you want a subset of an object use [. Remember that when you take a subset of an the object you get the same type of thing. Thus, the subset of a vector will be a vector, the subset of a list will be a list and the subset of a data.frame will be a data.frame.
There is one inconsistency, however. The default in R is to reduce the results to the lowest dimension, so if your subset contains only result, you will only get that one item which may be something of a different type. Thus, taking a subset of the iris data frame with only one column
class( iris[ , "Petal.Length" ] )
returns a numeric vector and not a data frame. You can override this behavior with the little publicized drop parameter, which indicates not to reduce the result. Taking the subset of iris with drop = FALSE
iris[ , "Petal.Length", drop=FALSE ]
is a proper data frame.
Things to Remember:
- Most often, a subset is the same type as the original object.
- Both indices and names can be used to extract the subset. ( In order to use names, object must have a name type attribute such as names, rownames, colnames, etc. )
- You can use negative integers to indicate exclusion.
- Unquoted variables are interpolated within the brackets.
Extract one item with [[
The double square brackets are used to extract one element from potentially many. For vectors yield vectors with a single value; data frames give a column vector; for list, one element:
The mnemonic device, here is that the double square bracket look as if you are asking for something deep within a container. You are not taking a slice but reaching to get at the one thing at the core.
Three important things to remember:
- You can return only one item.
- The result is not (necessarily) the same type of object as the container.
- The dimension will be the dimension of the one item which is not necessarily 1.
- And, as before:
- Names or indices can both be used.
- Variables are interpolated.
Interact with $
Interestingly enough, the accessor that provides the least unique utility is also probably used the most often used. $ is a special case of [[ in which you access a single item by actual name. The following are equivalent:
The appeal of this accessor is nothing more than brevity. One character, $, replaces six, [[""]]. This accessor is handiest when doing interactive programming but should be discouraged for more production oriented code because of its limitations, namely the inability to interpolate the names or use integer indices.
Things to Remember:
- You cannot use integer indices
- The name will not be interpolated.
- Returns only one item.
- If the name contains special characters, the name must be enclosed in backticks: ``
That is really all there is to it. [ - for subsets, [[ - for extracting items, and $ - for extracting by name.