**Anindya Mozumdar**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This is the third part of the series of articles on R vocabulary. In this series, we explore most of the functions mentioned in Chapter 2 of the book Advanced R. The first part of the series can be read here and the second part of the series can be read here.

We start this article by looking at some functions which work on dates. The function *strptime* is used to convert character vectors to an object of class *POSIXlt*. This is just a list where each component of the list represents some aspect of a calendar date and time. The documentation for *POSIXlt* gives the list of components available. A couple of them are demonstrated in the example below.

```
mydates_c <- c("01-Jan-2017", "05-May-2012", "07-Aug-2022")
mydates <- strptime(mydates_c, format = "%d-%b-%Y")
class(mydates)
```

`## [1] "POSIXlt" "POSIXt"`

`mydates[1]`

`## [1] "2017-01-01 IST"`

`mydates[[1]]$year`

`## [1] 117`

`mydates[[1]]$sec`

`## [1] 0`

*strptime* can be used to convert characters in a wide varierty of formats to *POSIXlt* objects. In the example, the format string *%d-%b-%Y* represents the day of the month, the abbreviated month name and the 4-digit year respectively. The full list of format strings which are accepted can be found in the documentation for the function. *strftime* does the reverse and is demonstrated in the example below. The functions *ISOdate* and *ISOdatetime* provide convenient wrappers over *strptime*. Note that the default time zone for *ISOdate* is GMT. Finally the function *date* returns a character string of the current system date and time.

`strftime(mydates, format = "%Y-%m-%d")`

`## [1] "2017-01-01" "2012-05-05" "2022-08-07"`

`strftime(mydates, format = "%d %B %y (%A)")`

```
## [1] "01 January 17 (Sunday)" "05 May 12 (Saturday)"
## [3] "07 August 22 (Sunday)"
```

`ISOdate(2019, 2, 1)`

`## [1] "2019-02-01 12:00:00 GMT"`

`ISOdatetime(2019, 2, 1, 13, 23, 17)`

`## [1] "2019-02-01 13:23:17 IST"`

`date()`

`## [1] "Thu Apr 18 19:19:12 2019"`

The function *difftime* creates time intervals in specified units. The largest possible units is *“weeks”*.

`difftime(mydates, strptime("01-Oct-2000", "%d-%b-%Y"))`

```
## Time differences in days
## [1] 5936 4234 7980
```

`difftime(mydates, strptime("01-Oct-2000", "%d-%b-%Y"), units = "weeks")`

```
## Time differences in weeks
## [1] 848.0000 604.8571 1140.0000
```

The functions *julian*, *months*, *quarters* and *weekdays* can be used extract parts from date-time objects. *julian* extracts the number of days since some origin. The lubridate package contains a lot of functions to make it easier to handle date and times.

`julian(mydates)`

```
## Time differences in days
## [1] 17166.77 15464.77 19210.77
## attr(,"origin")
## [1] "1970-01-01 GMT"
```

`months(mydates)`

`## [1] "January" "May" "August"`

`quarters(mydates)`

`## [1] "Q1" "Q2" "Q3"`

`weekdays(mydates)`

`## [1] "Sunday" "Saturday" "Sunday"`

The next set of functions we look at primarily relate to those which are useful for manipulating strings. *grep* and *grepl* look for a pattern in a character vector. The pattern can be a string or a regular expression. The *value* argument to *grep* determines whether the matches are returned or the indices of the matches are returned. The function *agrep* is used to do an approximate matching using the generalized Levenshtein edit distance function.

```
set.seed(123)
bnames <- sample(babynames::babynames$name, 50)
grep("An", bnames, fixed = TRUE, value = TRUE)
```

`## [1] "Angel"`

`grep("An", bnames, fixed = TRUE, value = FALSE)`

`## [1] 29`

`grepl("An", bnames, fixed = TRUE)`

```
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [12] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [23] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
## [34] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [45] FALSE FALSE FALSE FALSE FALSE FALSE
```

`grep("^N", bnames, value = TRUE) # regular expression - starts with N`

`## [1] "Navil"`

`agrep("Angie", bnames, fixed = TRUE, value = TRUE)`

`## [1] "Angel"`

The function *gsub* can be used to replace a pattern with another pattern. In the example below, *Angel* is transformed to *Pangel* using the replacement pattern provided. *strsplit* can be used to split a string based on a character vector or regular expression.

`gsub("An", "Pan", bnames[25:35], fixed = TRUE)`

```
## [1] "Jacques" "Breanne" "Rae" "Aldin" "Pangel" "Curlie" "Isaiyah"
## [8] "Idania" "Jaquawn" "Jillyan" "Ernest"
```

`strsplit(c("A:B:C", "1:2:3"), ":", fixed = TRUE)`

```
## [[1]]
## [1] "A" "B" "C"
##
## [[2]]
## [1] "1" "2" "3"
```

*chartr* is used to translate characters in character vectors. *tolower* and *toupper* convert to lower and upper case respectively. *nchar* counts the number of characters. *substr* is used to extract a part of a string using integer indices. It can also be used in the left hand side of the assignment operator to replace that part.

`chartr(":", "_", c("A:B:C", "1:2:3"))`

`## [1] "A_B_C" "1_2_3"`

`tolower("MAD DOG")`

`## [1] "mad dog"`

`toupper("radmuzom")`

`## [1] "RADMUZOM"`

`nchar(c("Anindya", "Mozumdar"))`

`## [1] 7 8`

`substr(c("Hello", "World"), 2, 3)`

`## [1] "el" "or"`

```
x <- "Hello"
substr(x, 1, 1) <- "C"
x
```

`## [1] "Cello"`

*paste* is used to concatenate strings with a separator. The default separator is a single space. *paste0* is a useful variant where the strings being concatenated are collapsed and there is no separator between them. The function *trimws* is used to remove leading or trailing whitespaces from a character vector.

`paste("Anindya", "Mozumdar")`

`## [1] "Anindya Mozumdar"`

`paste("Anindya", "Mozumdar", sep = ",")`

`## [1] "Anindya,Mozumdar"`

`paste0("Anindya", "Mozumdar")`

`## [1] "AnindyaMozumdar"`

```
x <- c(" Too", "many ", " spaces ")
trimws(x)
```

`## [1] "Too" "many" "spaces"`

`trimws(x, "right")`

`## [1] " Too" "many" " spaces"`

The stringr package provides a consistent set of string manipulation functions.

The next set of functions are related to factors. Factors are a basic data type in R. Internally, they are just integer vectors with an attribute level character vector. Factors can be created using the function *factor*.

```
x <- factor(sample(letters[1:3], 10, replace = TRUE))
str(x)
```

`## Factor w/ 3 levels "a","b","c": 1 2 3 1 2 1 1 3 3 2`

`attributes(x)`

```
## $levels
## [1] "a" "b" "c"
##
## $class
## [1] "factor"
```

The *levels* argument can be used to define the levels of a factor variable. The *ordered* argument is used to specify if the levels should be regarded as ordered. If a value is not specified in the *levels* argument, it is converted to a *NA*.

```
x <- factor(sample(letters[1:3], 10, replace = TRUE),
levels = c("a", "b", "c"), ordered = TRUE)
x
```

```
## [1] b a b a c b c c c b
## Levels: a < b < c
```

```
y <- factor(sample(letters[1:3], 10, replace = TRUE),
levels = c("a", "b"), ordered = TRUE)
y
```

`## [1] ` b a b a b b b a
## Levels: a < b

The function *nlevels* is used to obtain the number of levels of a factor. The function *levels* is used to obtain the factor levels as a character vector. It can also be used in the left hand side of the assignment vector to change the levels.

`nlevels(x)`

`## [1] 3`

`levels(x)`

`## [1] "a" "b" "c"`

```
levels(x) <- c("p", "q", "r")
x
```

```
## [1] q p q p r q r r r q
## Levels: p < q < r
```

The function *reorder* reorders the levels of a factor based on the values of a second variable, using a function applied to the second variable. In the example below, we create a factor which has three levels a, b or c. The 2nd argument *n* will a random normal vector of 10 values. We then reorder the levels of *a* depending on the sum of values in *n* which correspond to each level of *a*.

```
set.seed(123)
x <- factor(sample(letters[1:3], 10, replace = TRUE),
levels = c("a", "b", "c"), ordered = TRUE)
x
```

```
## [1] a c b c c a b c b b
## Levels: a < b < c
```

```
n <- rnorm(10)
n
```

```
## [1] 1.7150650 0.4609162 -1.2650612 -0.6868529 -0.4456620 1.2240818
## [7] 0.3598138 0.4007715 0.1106827 -0.5558411
```

```
y <- reorder(x, n, FUN = sum)
y
```

```
## [1] a c b c c a b c b b
## attr(,"scores")
## a b c
## 2.9391468 -1.3504058 -0.2708272
## Levels: b < c < a
```

The function *relevel* is used to order the levels of an unordered factor so that a reference level comes first and the remaining are moved down. Note that it can only be applied to an unordered factor.

```
set.seed(123)
x <- factor(sample(letters[1:3], 10, replace = TRUE),
levels = c("a", "b", "c"))
x
```

```
## [1] a c b c c a b c b b
## Levels: a b c
```

```
x <- relevel(x, "b")
x
```

```
## [1] a c b c c a b c b b
## Levels: b a c
```

*cut* is used to convert a numeric variable into a factor by dividing it into intervals and coding the values of the variable depending on the level which it falls. By default, the left side of the interval is an open interval.

```
x <- rnorm(10)
x
```

```
## [1] 1.7150650 0.4609162 -1.2650612 -0.6868529 -0.4456620 1.2240818
## [7] 0.3598138 0.4007715 0.1106827 -0.5558411
```

`cut(x, breaks = c(-Inf, 0.2, 0.5, 0.8, Inf))`

```
## [1] (0.8, Inf] (0.2,0.5] (-Inf,0.2] (-Inf,0.2] (-Inf,0.2] (0.8, Inf]
## [7] (0.2,0.5] (0.2,0.5] (-Inf,0.2] (-Inf,0.2]
## Levels: (-Inf,0.2] (0.2,0.5] (0.5,0.8] (0.8, Inf]
```

`cut(x, breaks = c(-Inf, 0.2, 0.5, 0.8, Inf), labels = c("A", "B", "C", "D"))`

```
## [1] D B A A A D B B A A
## Levels: A B C D
```

```
cut(x, breaks = c(-Inf, 0.2, 0.5, 0.8, Inf), labels = c("A", "B", "C", "D"),
include.lowest = TRUE)
```

```
## [1] D B A A A D B B A A
## Levels: A B C D
```

*findInterval* is used to find the indices of a numeric variable in a set of intervals. In the example below, we take 10 random numbers from -100 to 100 and find which interval they lie in. The intervals are determined by the 2nd argument and are [-Inf, 0], [0, 5) and so on.

```
x <- sample(-100:100, 10)
x
```

`## [1] 93 80 37 57 -96 -7 47 -59 -39 -56`

`findInterval(x, c(-Inf, 0, 5, 10, 80, Inf))`

`## [1] 5 5 4 4 1 1 4 1 1 1`

`findInterval(x, c(-Inf, 0, 5, 10, 80, Inf), left.open = TRUE)`

`## [1] 5 4 4 4 1 1 4 1 1 1`

*interaction* is used to find the interaction of two or more factors. By default, the *.* character is used to construct the new level names. It can be modified using the *sep* argument to *interaction*.

```
f1 <- factor(sample(letters[1:2], 10, replace = TRUE),
levels = c("a", "b"))
f2 <- factor(sample(letters[25:26], 10, replace = TRUE),
levels = c("y", "z"))
interaction(f1, f2)
```

```
## [1] a.y a.y a.z a.y a.z a.y a.y a.z a.z b.y
## Levels: a.y b.y a.z b.z
```

`interaction(f1, f2, sep = "|")`

```
## [1] a|y a|y a|z a|y a|z a|y a|y a|z a|z b|y
## Levels: a|y b|y a|z b|z
```

The next set of functions described in this chapter in the book pertain to statistics. We will be covering them in a separate article. In this article, we continue looking at functions which relate to working with R.

The function *ls* returns the names of objects, in a specified environment, as a character string. In the example below, it lists all the variables and functions we have defined till now. Inside a function, it returns the name of the function’s local variables.

`ls()`

```
## [1] "bnames" "f1" "f2" "mydates" "mydates_c" "n"
## [7] "x" "y"
```

```
f <- function(x, y) {
print(ls())
(x + y) ^ (x - y)
}
f(2, 3)
```

`## [1] "x" "y"`

`## [1] 0.2`

The *exists* function is used to search for the name of an object in an environment. *rm* can be used to remove objects. Note that after the call to *rm*, the function *f1* which was previously displayed by *ls* no longer exists.

`exists("f1")`

`## [1] TRUE`

```
rm("f1")
ls()
```

```
## [1] "bnames" "f" "f2" "mydates" "mydates_c" "n"
## [7] "x" "y"
```

*getwd* is used to retrieve the current working directory, while *setwd* is used to set the working directory. The function *quit*, or it’s alias *q*, will terminate the current R session. *source* is used to accept R expressions from a named file, URL or connection. It can be used to define functions which are stored in an external file, and you don’t want to copy-paste them onto your current R session. *install.packages* is used to install an R package. *library* and *require* are used to load and attach packages. *require* is primarily for use inside functions; it gives a warning if the package does not exists. *library* will throw an error if it cannot find the required package. To remove a package from the current search list, use *detach* with the argument *unload* set to *TRUE*.

```
library(ggplot2)
detach("package:ggplot2", unload = TRUE)
```

*apropos* can be used to search for objects in the search list, and *find* returns where the particular object can be found. *RSiteSearch* can be used to search for words or phrases and view them in a web browser.

`apropos("xy")`

```
## [1] "plot.xy" "sortedXyData" "xy.coords" "xyinch"
## [5] "xyTable" "xyz.coords"
```

`find("xyTable")`

`## [1] "package:grDevices"`

*citation* tells you how to cite R and R packages. *demo()* provides you a list of topics on which demonstration scripts have been provided; run *demo* on a particular topic to view the demonstration. *example* allows you to run the examples in a particular help topic. *vignette* is used to list available vignettes or view a specific one.

`citation(package = "ggplot2")`

```
##
## To cite ggplot2 in publications, please use:
##
## H. Wickham. ggplot2: Elegant Graphics for Data Analysis.
## Springer-Verlag New York, 2016.
##
## A BibTeX entry for LaTeX users is
##
## @Book{,
## author = {Hadley Wickham},
## title = {ggplot2: Elegant Graphics for Data Analysis},
## publisher = {Springer-Verlag New York},
## year = {2016},
## isbn = {978-3-319-24277-4},
## url = {https://ggplot2.tidyverse.org},
## }
```

There are a set of functions which are primarily used to handle exceptions and debug R code. We won’t describe them here – it is recommended that you read the “Exceptions and debugging” chapter in the “Advanced R” book.

Next we look at some of the input/output functions. *print* is a generic function which prints it’s argument and returns it invisibly. *cat* concatenates the representations of the objects which are passed to it and outputs them.

```
f <- function(x) {
print(x)
}
y <- f(2)
```

`## [1] 2`

`y`

`## [1] 2`

`cat(c(3.3, 7.1), c("a", "b", "c"))`

`## 3.3 7.1 a b c`

*message* and *warning* generates diagnostic or warning messages respectively. Note the string “Warning:” automatically added to the output of *warning*.

`message("I am a message")`

`## I am a message`

`warning("I am a warning")`

`## Warning: I am a warning`

*dput* is useful to create a text representation of an R object and write it to a file. It is especially useful if you want to share a small R object in an online forum while asking for help; the output can read by *dget*.

```
df <- data.frame(x = 1, y = 2)
z <- dput(df)
```

```
## structure(list(x = 1, y = 2), class = "data.frame", row.names = c(NA,
## -1L))
```

`z`

```
## x y
## 1 1 2
```

*format* is used to format an R object for pretty printing. *sprintf* is a wrapper for the C function by the same name which returns a character vector containing a formatted combination of text and variable values.

```
x <- rnorm(5) * 1000
format(x, nsmall = 1)
```

`## [1] " 426.4642" "-295.0715" " 895.1257" " 878.1335" " 821.5811"`

`format(x, scientific = TRUE)`

```
## [1] " 4.264642e+02" "-2.950715e+02" " 8.951257e+02" " 8.781335e+02"
## [5] " 8.215811e+02"
```

`sprintf("N: %6.2g", x)`

`## [1] "N: 4.3e+02" "N: -3e+02" "N: 9e+02" "N: 8.8e+02" "N: 8.2e+02"`

*sink* and *capture.output* are used to send R output to a character string, file or connection. There are a number of functions for reading and writing external data sources. We won’t describe them in detail here. The packages readr, haven and foreign can be used to read data in a wide variety of formats. The functions with the *file.* (*file.path*, *file.copy*, *file.create*, *file.remove*, *file.rename*, *file.exists* and *file.info*) prefix provide an interface to the file system interface from within R code. *dir.create* can be used to create a directory in the file system.

**leave a comment**for the author, please follow the link and comment on their blog:

**Anindya Mozumdar**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.