# R Vocabulary – Part 2

**Anindya Mozumdar**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This is the second part of the series of articles on R vocabulary. In this series, we explore most of the functions mentioned in Chapter 2 of the book Advanced R. The first part of the series can be read here.

The keyword *function* is used to define what is technically a *closure* in R. It has three components – it’s *formals* (arguments), the body of the function and the *enviroment*. A closure returns the value of the last expression which is evaluated in it’s body. A function can also return a value using the *return* keyword. This need not be at the end of the function.

f <- function(x, y) x + y + 1 formals(f)

## $x ## ## ## $y

f(3.2, 1.7)

## [1] 5.9

g <- function(x, y) { if (x > y) { return("greater") } else { return("less than or equal to") } } g(3.2, 1.7)

## [1] "greater"

g(1, 5)

## [1] "less than or equal to"

The function *missing* can be used test whether a value was specified as an argument to a function. The function *on.exit* can be used to store an expression which needs to be executed when the function exits. This is useful to perform any kind of clean up actions or restore global options when the function exits.

incrementx <- function(x, y) { on.exit(print("I am exiting")) if (missing(y)) { y <- 1 } x + y } incrementx(2)

## [1] "I am exiting"

## [1] 3

incrementx(2, 3)

## [1] "I am exiting"

## [1] 5

The function *invisible* is used to return a value which can be assigned to another variable, but which does not print if not assigned.

f <- function(n) { invisible(rnorm(n) * rnorm(n)) } f(100) x <- f(100) str(x)

## num [1:100] -0.81093 -0.81097 -0.07441 0.1371 -0.00579 ...

The logical operators are ! (not), & (and), &&, | (or), || and xor. The & operator works similarly to arithmetic operators and does an element-wise comparision on vectors. The && operator examines only the first element of each vector and is most appropriately used in *if* clauses. *all* and *any* checks whether all of the values or any of the values in a logical vector true respectively.

x <- sample(c(TRUE, FALSE), 5, replace = TRUE) y <- sample(c(TRUE, FALSE), 5, replace = TRUE) x

## [1] TRUE FALSE TRUE FALSE TRUE

y

## [1] FALSE TRUE FALSE TRUE FALSE

x & y

## [1] FALSE FALSE FALSE FALSE FALSE

x && y

## [1] FALSE

x | y

## [1] TRUE TRUE TRUE TRUE TRUE

x || y

## [1] TRUE

!x

## [1] FALSE TRUE FALSE TRUE FALSE

xor(x, y)

## [1] TRUE TRUE TRUE TRUE TRUE

all(x)

## [1] FALSE

any(x)

## [1] TRUE

all(c(TRUE, NA))

## [1] NA

all(c(TRUE, NA), na.rm = TRUE)

## [1] TRUE

*intersect*, *union*, *setdiff*, *setequal* and *is.element* together forms the set operations functions. As they are set operations, they discard any duplicate values.

x <- c(1, seq(1, 5, 1)) y <- seq(3, 10, 2) x

## [1] 1 1 2 3 4 5

y

## [1] 3 5 7 9

union(x, y) # Note that the duplicate 1 is discarded

## [1] 1 2 3 4 5 7 9

intersect(x, y)

## [1] 3 5

setdiff(x, y)

## [1] 1 2 4

setequal(x, y)

## [1] FALSE

setequal(c(1, 2), c(2, 1))

## [1] TRUE

is.element(1, x)

## [1] TRUE

is.element(1, y)

## [1] FALSE

*which* takes a condition and returns the indices where the condition is true.

x <- 1:10 which(x > 5)

## [1] 6 7 8 9 10

x <- array(1:20, dim = c(2, 2, 5)) which(x > 18, arr.ind = TRUE)

## dim1 dim2 dim3 ## [1,] 1 2 5 ## [2,] 2 2 5

Next we have functions which primarily operate on vectors and matrices, and are sometimes applicable to data frames. *length* can also be used on the left hand side of the assignment operator to either truncate or lengthen a vector.

x <- rnorm(10) length(x)

## [1] 10

dim(x)

## NULL

length(cars) # returns the numbers of columns of the data frame cars

## [1] 2

nrow(cars)

## [1] 50

ncol(cars)

## [1] 2

dim(cars)

## [1] 50 2

x <- 1:5 length(x) <- 3 x

## [1] 1 2 3

length(x) <- 5 x

## [1] 1 2 3 NA NA

*cbind* and *rbind* are used to combine R objects by columns or rows.

d1 <- data.frame(x = rnorm(5)) d2 <- data.frame(y = rnorm(5)) d3 <- data.frame(x = rnorm(3)) cbind(d1, d2)

## x y ## 1 -0.6448522 -0.07822180 ## 2 0.1178314 0.77584288 ## 3 0.2773237 -0.37796927 ## 4 0.4127771 0.04541585 ## 5 -0.0810581 0.69285839

rbind(d1, d3)

## x ## 1 -0.64485221 ## 2 0.11783136 ## 3 0.27732368 ## 4 0.41277710 ## 5 -0.08105810 ## 6 -0.28711446 ## 7 0.77660840 ## 8 0.08090678

rbind(d2, d3)

## Error in match.names(clabs, names(xi)): names do not match previous names

The function *names* is used to retrive the names of an object. The functions *rownames* and *colnames* are used to retrieve the row or columns of a object like a data frame or a matrix. They can also be used to assign names of an object.

cars_mdl <- lm(speed ~ dist, data = cars) names(cars_mdl)

## [1] "coefficients" "residuals" "effects" "rank" ## [5] "fitted.values" "assign" "qr" "df.residual" ## [9] "xlevels" "call" "terms" "model"

colnames(cars)

## [1] "speed" "dist"

rownames(cars)

## [1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13" "14" ## [15] "15" "16" "17" "18" "19" "20" "21" "22" "23" "24" "25" "26" "27" "28" ## [29] "29" "30" "31" "32" "33" "34" "35" "36" "37" "38" "39" "40" "41" "42" ## [43] "43" "44" "45" "46" "47" "48" "49" "50"

d <- data.frame(x = rnorm(5)) rownames(d) <- c("A", "B", "C", "D", "E") d

## x ## A -2.4440030 ## B 1.5771225 ## C 0.2227441 ## D 1.8124210 ## E -1.9584089

*t* calculates the transpose of a matrix. *diag* can be used to retrieve the diagonal elements of a matrix, construct a diagonal matrix from a vector or even replace the diagonal elements of a matrix.

m <- matrix(1:4, 2, 2) m

## [,1] [,2] ## [1,] 1 3 ## [2,] 2 4

t(m)

## [,1] [,2] ## [1,] 1 2 ## [2,] 3 4

diag(m)

## [1] 1 4

diag(m) <- c(7, 8) m

## [,1] [,2] ## [1,] 7 3 ## [2,] 2 8

diag(c(1, 2, 3), 3, 3)

## [,1] [,2] [,3] ## [1,] 1 0 0 ## [2,] 0 2 0 ## [3,] 0 0 3

*data.matrix* is used to convert all variables in a data frame to numbers and return it as a matrix. Factors are replaced with their numeric codes.

d <- data.frame(x = 1:2, y = c("a", "b")) str(d)

## 'data.frame': 2 obs. of 2 variables: ## $ x: int 1 2 ## $ y: Factor w/ 2 levels "a","b": 1 2

data.matrix(d)

## x y ## [1,] 1 1 ## [2,] 2 2

Next we look at a set of functions whose output is typically a vector. *rep* and *rep_len* are used to replicate the elements of a vector. *seq*, *seq_along* and *seq_len* are used to create sequences. *rev* is used to reverse the elements of a vector.

rep(c(1, 2), each = 2)

## [1] 1 1 2 2

rep_len(3, length.out = 5)

## [1] 3 3 3 3 3

seq(1, 5)

## [1] 1 2 3 4 5

seq(1, 5, by = 2)

## [1] 1 3 5

seq_len(7)

## [1] 1 2 3 4 5 6 7

seq_along(letters)

## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 ## [24] 24 25 26

rev(letters)

## [1] "z" "y" "x" "w" "v" "u" "t" "s" "r" "q" "p" "o" "n" "m" "l" "k" "j" ## [18] "i" "h" "g" "f" "e" "d" "c" "b" "a"

We have used the function *sample* in quite a few of the examples in this series of articles. It is used to sample a vector for a specified number of elements, with or without replacement.

sample(letters, 3)

## [1] "x" "v" "z"

sample(1:3, 10, replace = TRUE)

## [1] 2 1 3 2 2 3 3 3 3 3

The *is.<>* and *as.<>* functions can be used to test whether a vector belongs to a particular type and coerce the vector to a particular type respectively.

x <- c(TRUE, FALSE, TRUE) is.numeric(x)

## [1] FALSE

is.logical(x)

## [1] TRUE

as.numeric(x)

## [1] 1 0 1

We now look at a few functions which operate primarily on lists and data frames. We have already looked at *list* to create lists. The function *unlist* can be used to simplify a list into a vector. In the example below, note how the first call results in a numeric vector while the second call results in a character vector.

l <- list(x = 1, y = 2) unlist(l)

## x y ## 1 2

l <- list(x = 1, y = 2, z = "a") unlist(l)

## x y z ## "1" "2" "a"

*data.frame* is used to create a new data frame. Note that under default options, character variables are automatically coverted to factors. *as.data.frame* is used to coerce an object to a data frame, if possible.

d <- data.frame(x = c(1, 2), y = c("a", "b")) str(d)

## 'data.frame': 2 obs. of 2 variables: ## $ x: num 1 2 ## $ y: Factor w/ 2 levels "a","b": 1 2

l <- list(x = c(1, 2), y = c(3, 4), z = "a") as.data.frame(l)

## x y z ## 1 1 3 a ## 2 2 4 a

*split* is useful to divide a data frame by groups of a particular variable. A function is typically applied to the resulting list to calculate the results by each group. In the example below, we first split the data frame *d* into three different groups and then calculate the mean *y* for each group.

d <- data.frame(x = sample(letters[1:3], 10, replace = TRUE), y = rnorm(10)) s <- split(d, d$x) s

## $a ## x y ## 3 a -0.000494805 ## 4 a 0.262358663 ## 6 a -1.479823764 ## 8 a -1.421484123 ## ## $b ## x y ## 1 b -2.76062240 ## 2 b -0.76668204 ## 5 b -0.11053905 ## 7 b -0.75468644 ## 10 b 0.01145111 ## ## $c ## x y ## 9 c 0.2082831

sapply(s, function(df) mean(df$y))

## a b c ## -0.6598610 -0.8762158 0.2082831

*expand.grid* is useful to create a data frame using all combinations of the vectors provided as arguments.

x <- c("a", "b") y <- c("p", "q", "r") z <- c("m", "n") expand.grid(x, y, z)

## Var1 Var2 Var3 ## 1 a p m ## 2 b p m ## 3 a q m ## 4 b q m ## 5 a r m ## 6 b r m ## 7 a p n ## 8 b p n ## 9 a q n ## 10 b q n ## 11 a r n ## 12 b r n

We will not be looking at details of the control flow operations in this article. These include *if*, *for*, *while*, *next*, *break*, *switch* and *ifelse*. These are primarily used to implement loops and execute different code based on different conditions.

The *apply* functions are explained in great detail in the chapter on ‘Functionals’ in the same book, and we will not look at them here. It is also recommended that you look at the apply functions in the *plyr* package, which provides a consistent interface between different types of objects (lists, arrays and data frames).

**leave a comment**for the author, please follow the link and comment on their blog:

**Anindya Mozumdar**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.