R Vocabulary – Part 2

[This article was first published on Anindya Mozumdar, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This is the second part of the series of articles on R vocabulary. In this series, we explore most of the functions mentioned in Chapter 2 of the book Advanced R. The first part of the series can be read here.

The keyword function is used to define what is technically a closure in R. It has three components – it’s formals (arguments), the body of the function and the enviroment. A closure returns the value of the last expression which is evaluated in it’s body. A function can also return a value using the return keyword. This need not be at the end of the function.

f <- function(x, y) x + y + 1
formals(f)
## $x
## 
## 
## $y
f(3.2, 1.7)
## [1] 5.9
g <- function(x, y) {
  if (x > y) {
    return("greater")
  } else {
    return("less than or equal to")
  }
}
g(3.2, 1.7)
## [1] "greater"
g(1, 5)
## [1] "less than or equal to"

The function missing can be used test whether a value was specified as an argument to a function. The function on.exit can be used to store an expression which needs to be executed when the function exits. This is useful to perform any kind of clean up actions or restore global options when the function exits.

incrementx <- function(x, y) {
  
  on.exit(print("I am exiting"))
  
  if (missing(y)) {
    y <- 1
  }
  x + y
}
incrementx(2)
## [1] "I am exiting"
## [1] 3
incrementx(2, 3)
## [1] "I am exiting"
## [1] 5

The function invisible is used to return a value which can be assigned to another variable, but which does not print if not assigned.

f <- function(n) {
  invisible(rnorm(n) * rnorm(n))
}
f(100)
x <- f(100)
str(x)
##  num [1:100] -0.81093 -0.81097 -0.07441 0.1371 -0.00579 ...

The logical operators are ! (not), & (and), &&, | (or), || and xor. The & operator works similarly to arithmetic operators and does an element-wise comparision on vectors. The && operator examines only the first element of each vector and is most appropriately used in if clauses. all and any checks whether all of the values or any of the values in a logical vector true respectively.

x <- sample(c(TRUE, FALSE), 5, replace = TRUE)
y <- sample(c(TRUE, FALSE), 5, replace = TRUE)
x
## [1]  TRUE FALSE  TRUE FALSE  TRUE
y
## [1] FALSE  TRUE FALSE  TRUE FALSE
x & y
## [1] FALSE FALSE FALSE FALSE FALSE
x && y
## [1] FALSE
x | y
## [1] TRUE TRUE TRUE TRUE TRUE
x || y
## [1] TRUE
!x
## [1] FALSE  TRUE FALSE  TRUE FALSE
xor(x, y)
## [1] TRUE TRUE TRUE TRUE TRUE
all(x)
## [1] FALSE
any(x)
## [1] TRUE
all(c(TRUE, NA))
## [1] NA
all(c(TRUE, NA), na.rm = TRUE)
## [1] TRUE

intersect, union, setdiff, setequal and is.element together forms the set operations functions. As they are set operations, they discard any duplicate values.

x <- c(1, seq(1, 5, 1))
y <- seq(3, 10, 2)
x
## [1] 1 1 2 3 4 5
y
## [1] 3 5 7 9
union(x, y) # Note that the duplicate 1 is discarded
## [1] 1 2 3 4 5 7 9
intersect(x, y)
## [1] 3 5
setdiff(x, y)
## [1] 1 2 4
setequal(x, y)
## [1] FALSE
setequal(c(1, 2), c(2, 1))
## [1] TRUE
is.element(1, x)
## [1] TRUE
is.element(1, y)
## [1] FALSE

which takes a condition and returns the indices where the condition is true.

x <- 1:10
which(x > 5)
## [1]  6  7  8  9 10
x <- array(1:20, dim = c(2, 2, 5))
which(x > 18, arr.ind = TRUE)
##      dim1 dim2 dim3
## [1,]    1    2    5
## [2,]    2    2    5

Next we have functions which primarily operate on vectors and matrices, and are sometimes applicable to data frames. length can also be used on the left hand side of the assignment operator to either truncate or lengthen a vector.

x <- rnorm(10)
length(x)
## [1] 10
dim(x)
## NULL
length(cars) # returns the numbers of columns of the data frame cars
## [1] 2
nrow(cars)
## [1] 50
ncol(cars)
## [1] 2
dim(cars)
## [1] 50  2
x <- 1:5
length(x) <- 3
x
## [1] 1 2 3
length(x) <- 5
x
## [1]  1  2  3 NA NA

cbind and rbind are used to combine R objects by columns or rows.

d1 <- data.frame(x = rnorm(5))
d2 <- data.frame(y = rnorm(5))
d3 <- data.frame(x = rnorm(3))
cbind(d1, d2)
##            x           y
## 1 -0.6448522 -0.07822180
## 2  0.1178314  0.77584288
## 3  0.2773237 -0.37796927
## 4  0.4127771  0.04541585
## 5 -0.0810581  0.69285839
rbind(d1, d3)
##             x
## 1 -0.64485221
## 2  0.11783136
## 3  0.27732368
## 4  0.41277710
## 5 -0.08105810
## 6 -0.28711446
## 7  0.77660840
## 8  0.08090678
rbind(d2, d3)
## Error in match.names(clabs, names(xi)): names do not match previous names

The function names is used to retrive the names of an object. The functions rownames and colnames are used to retrieve the row or columns of a object like a data frame or a matrix. They can also be used to assign names of an object.

cars_mdl <- lm(speed ~ dist, data = cars)
names(cars_mdl)
##  [1] "coefficients"  "residuals"     "effects"       "rank"         
##  [5] "fitted.values" "assign"        "qr"            "df.residual"  
##  [9] "xlevels"       "call"          "terms"         "model"
colnames(cars)
## [1] "speed" "dist"
rownames(cars)
##  [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10" "11" "12" "13" "14"
## [15] "15" "16" "17" "18" "19" "20" "21" "22" "23" "24" "25" "26" "27" "28"
## [29] "29" "30" "31" "32" "33" "34" "35" "36" "37" "38" "39" "40" "41" "42"
## [43] "43" "44" "45" "46" "47" "48" "49" "50"
d <- data.frame(x = rnorm(5))
rownames(d) <- c("A", "B", "C", "D", "E")
d
##            x
## A -2.4440030
## B  1.5771225
## C  0.2227441
## D  1.8124210
## E -1.9584089

t calculates the transpose of a matrix. diag can be used to retrieve the diagonal elements of a matrix, construct a diagonal matrix from a vector or even replace the diagonal elements of a matrix.

m <- matrix(1:4, 2, 2)
m
##      [,1] [,2]
## [1,]    1    3
## [2,]    2    4
t(m)
##      [,1] [,2]
## [1,]    1    2
## [2,]    3    4
diag(m)
## [1] 1 4
diag(m) <- c(7, 8)
m
##      [,1] [,2]
## [1,]    7    3
## [2,]    2    8
diag(c(1, 2, 3), 3, 3)
##      [,1] [,2] [,3]
## [1,]    1    0    0
## [2,]    0    2    0
## [3,]    0    0    3

data.matrix is used to convert all variables in a data frame to numbers and return it as a matrix. Factors are replaced with their numeric codes.

d <- data.frame(x = 1:2, y = c("a", "b"))
str(d)
## 'data.frame':    2 obs. of  2 variables:
##  $ x: int  1 2
##  $ y: Factor w/ 2 levels "a","b": 1 2
data.matrix(d)
##      x y
## [1,] 1 1
## [2,] 2 2

Next we look at a set of functions whose output is typically a vector. rep and rep_len are used to replicate the elements of a vector. seq, seq_along and seq_len are used to create sequences. rev is used to reverse the elements of a vector.

rep(c(1, 2), each = 2)
## [1] 1 1 2 2
rep_len(3, length.out = 5)
## [1] 3 3 3 3 3
seq(1, 5)
## [1] 1 2 3 4 5
seq(1, 5, by = 2)
## [1] 1 3 5
seq_len(7)
## [1] 1 2 3 4 5 6 7
seq_along(letters)
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
## [24] 24 25 26
rev(letters)
##  [1] "z" "y" "x" "w" "v" "u" "t" "s" "r" "q" "p" "o" "n" "m" "l" "k" "j"
## [18] "i" "h" "g" "f" "e" "d" "c" "b" "a"

We have used the function sample in quite a few of the examples in this series of articles. It is used to sample a vector for a specified number of elements, with or without replacement.

sample(letters, 3)
## [1] "x" "v" "z"
sample(1:3, 10, replace = TRUE)
##  [1] 2 1 3 2 2 3 3 3 3 3

The is.<> and as.<> functions can be used to test whether a vector belongs to a particular type and coerce the vector to a particular type respectively.

x <- c(TRUE, FALSE, TRUE)
is.numeric(x)
## [1] FALSE
is.logical(x)
## [1] TRUE
as.numeric(x)
## [1] 1 0 1

We now look at a few functions which operate primarily on lists and data frames. We have already looked at list to create lists. The function unlist can be used to simplify a list into a vector. In the example below, note how the first call results in a numeric vector while the second call results in a character vector.

l <- list(x = 1, y = 2)
unlist(l)
## x y 
## 1 2
l <- list(x = 1, y = 2, z = "a")
unlist(l)
##   x   y   z 
## "1" "2" "a"

data.frame is used to create a new data frame. Note that under default options, character variables are automatically coverted to factors. as.data.frame is used to coerce an object to a data frame, if possible.

d <- data.frame(x = c(1, 2), y = c("a", "b"))
str(d)
## 'data.frame':    2 obs. of  2 variables:
##  $ x: num  1 2
##  $ y: Factor w/ 2 levels "a","b": 1 2
l <- list(x = c(1, 2), y = c(3, 4), z = "a")
as.data.frame(l)
##   x y z
## 1 1 3 a
## 2 2 4 a

split is useful to divide a data frame by groups of a particular variable. A function is typically applied to the resulting list to calculate the results by each group. In the example below, we first split the data frame d into three different groups and then calculate the mean y for each group.

d <- data.frame(x = sample(letters[1:3], 10, replace = TRUE),
                y = rnorm(10))
s <- split(d, d$x)
s
## $a
##   x            y
## 3 a -0.000494805
## 4 a  0.262358663
## 6 a -1.479823764
## 8 a -1.421484123
## 
## $b
##    x           y
## 1  b -2.76062240
## 2  b -0.76668204
## 5  b -0.11053905
## 7  b -0.75468644
## 10 b  0.01145111
## 
## $c
##   x         y
## 9 c 0.2082831
sapply(s, function(df) mean(df$y))
##          a          b          c 
## -0.6598610 -0.8762158  0.2082831

expand.grid is useful to create a data frame using all combinations of the vectors provided as arguments.

x <- c("a", "b")
y <- c("p", "q", "r")
z <- c("m", "n")
expand.grid(x, y, z)
##    Var1 Var2 Var3
## 1     a    p    m
## 2     b    p    m
## 3     a    q    m
## 4     b    q    m
## 5     a    r    m
## 6     b    r    m
## 7     a    p    n
## 8     b    p    n
## 9     a    q    n
## 10    b    q    n
## 11    a    r    n
## 12    b    r    n

We will not be looking at details of the control flow operations in this article. These include if, for, while, next, break, switch and ifelse. These are primarily used to implement loops and execute different code based on different conditions.

The apply functions are explained in great detail in the chapter on ‘Functionals’ in the same book, and we will not look at them here. It is also recommended that you look at the apply functions in the plyr package, which provides a consistent interface between different types of objects (lists, arrays and data frames).

To leave a comment for the author, please follow the link and comment on their blog: Anindya Mozumdar.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)