**Open Source Automation**, and kindly contributed to R-bloggers)

I wanted to write a post about a couple of handy functions in R that don’t always get the recognition they deserve. This article will talk about a few functions that form part of R’s core functional programming capabilities. R has thousands of functions, so this is just a short list, and I’ll probably write other articles like this in the future to discuss some different R functions.

**Reduce**

Let’s start with the **Reduce** function (note the capital “R”). **Reduce** takes a list or vector as input, and reduces it down to a single element. It works by applying a function to the first two elements of the vector or list, and then applying the same function to that result with the third element. This new result gets passed with the fourth element into the function and so on until a single object remains. If the input is a vector, the result will be a single number or character. On the other hand, inputting a *list* can have interesting results. A list of data frames can be reduced down to a single data frame, a list of vectors can be collapsed into a matrix, and so on.

A simple, though not entirely useful, example of how this works is like so:

test <- 1:10 result <- Reduce(sum, test)

Here, *result* will equal 55, which happens to be the sum of the vector *test* i.e. the sum of the integers 1 through 10. **Reduce** solves for this by first applying the **sum** function to 1 and 2 (the first two elements in test). This equals 3, which then gets summed with the next element in the vector, 3. This total of 6 gets added to 4, which equals 10, and so on. The process can be seen below.

**1 + 2 = 3**

3 + 3 = 6

6 + 4 = 10

10 + 5 = 15

15 + 6 = 21

21 + 7 = 28

28 + 8 = 36

36 + 9 = 45

**45 + 10 = 55**

Now, how about something a little more useful? What if you had a list of vectors and you wanted to combine them into a matrix?

test <- list(1:3, 4:6, 7:9, 10:12, 13:15, 16:18) matrix_result <- Reduce(rbind, test)

In this case, we have a list of six three-element vectors. **Reduce** applies **rbind** to the first two vectors, 1:3 and 4:6 initially. This creates a 2 x 3 matrix, where the first row is 1:3, and the second row is 4:6.

**1 2 3
4 5 6**

Then, the above result is combined (via **rbind**) to the next vector in the list, 7:9.

**1 2 3
4 5 6
7 8 9**

This process continues, as you can see below:

**1 2 3
4 5 6
7 8 9
10 11 12**

Next:

**1 2 3
4 5 6
7 8 9
10 11 12
13 14 15**

Finally:

**1 2 3
4 5 6
7 8 9
10 11 12
13 14 15
16 17 18**

Thus, the final result is a single object — but in this case, is a 6 x 3 matrix because **rbind** collapsed all of the vectors of the list, test, into a single matrix.

Similarly, you could run this example using **cbind** instead of **rbind** and that would collapse the vectors column-wise, rather than row-wise.

Another example where **Reduce** comes in handy might be if you want to combine a collection of data frames into a single one.

state_data <- list(FL = data.frame(state = c("FL","FL","FL"), city = c("Miami","Jacksonville","Saint Augustine")) NY = data.frame(state = c("NY","NY","NY"), city = c("NYC","Buffalo","Rochester")), MD = data.frame(state = c("MD","MD","MD"), city = c("Baltimore","Annapolis","Ocean City") ) combined <- data.frame(Reduce(rbind, state_data))

**Filter**

The Filter function does basically what it sounds like — it applies a filter to a vector, list, or data frame (which is actually a type of list). It takes two main inputs, a function that applies the filter, and the object for which the filter applies.

Here’s a simple example:

test <- 1:10 less_than_5 <- Filter(function(x) x < 5, test)

This, once again, creates a vector of the first 10 positive integers. The **Filter** function applies *function(x) x < 5* to each element, *x*, in the vector, *test*. In other words, it checks each element, *x*, for the Boolean expression, *x < 5*. If an element is not less than 5, it gets filtered out.

So you might be thinking…can’t this be done like this?

less_than_5 <- test[test < 5]

…and the answer is…yes. It can be done that way. **Filter** is more useful as a function in cases involving data frames or lists. Suppose, for instance, you want to remove all constant columns from a data frame. This is something that may be done when preprocessing data prior to modeling, as a constant attribute isn’t particular useful.

This is can be done in one line using **Filter**

df <- data.frame(a = c(2,2,2), b = c(1,2,3), c = c(1,1,1), d = c(3,4,5)) without_constants <- Filter(function(x) length(unique(x)) > 1, df)

Alternatively, using dplyr’s *n_distinct* function, which counts the number of distinct elements in a vector, you could do this:

library(dplyr) df <- data.frame(a = c(2,2,2), b = c(1,2,3), c = c(1,1,1), d = c(3,4,5)) without_constants <- Filter(function(x) n_distinct(x) > 1, df)

In the example, we create a data frame with four columns — two of them are constant. **Filter** tests whether there is more than one unique value in each column. If there is only one unique value, then we know the column is constant, and it gets filtered out. Each element *x* is a vector, or column, in the data frame.

If you wanted to just drop all columns that are all NAs, you could make a minor tweak like this:

df <- data.frame(a = c(2,2,2), b = c(1,2,3), c = c(1,1,1), d = c(NA, NA, NA)) without_nas <- Filter(function(x) !all(is.na(x)), df)

**Filter** can also be used on a regular list as well. Suppose you have a list of vectors, where some of the vectors are characters, while others are numeric. If want to filter out all of the non-numeric vectors, you could call **Filter**:

sample_list <- list(a = c(1,2,3), b = c("is","a","character"), c = c(4,5,6), d = c("is","another","character")) only_numeric <- Filter(function(x) is.numeric(x), sample_list)

**rapply**

The **rapply** function is part of the apply family of functions in R. It has a few different uses, but one of my favorite applications for it is to apply a function to columns of a data frame that belong to a specific class, or have a particular data type.

Let’s say you want to get the sum of all of the numeric columns.

df <- data.frame(a = c(2,2,2), b = c(1,2,3), c = c("r","is","awesome"), d = c(3,4,5), e=c("some","other","character")) summed_columns <- rapply(df, sum, class = "numeric")

Similar to *sapply* or *lapply*, **rapply** takes a list / vector / data frame as input, along with a function to be applied. However, it can also take a “class” parameter, which allows us to specify what class of object we want our function to be used for.

**rapply** can also be used to recursively apply functions to nested lists (see examples from its documentation here).

**rep**

The last function I want to mention for this post is the **rep** function. This can be used to repeat a value as many times as you want. So if you want to create a vector of 1000 5’s, it could be done like this:

rep(5, 1000)

Here’s a couple other examples:

rep("a", 500) rep("repeat this", 100)

If you pass a vector with more than one element to **rep**, the entire vector gets repeated the number of times you specify.

rep(c(1,2,3), 100)

The above code will create a vector with 300 elements — the number of elements in c(1,2,3) times 100, repeating 1, 2, 3 over and over.

That’s it for now! Check out other R posts of mine here: http://theautomatic.net/category/r/

The post Underrated R Functions appeared first on Open Source Automation.

**leave a comment**for the author, please follow the link and comment on their blog:

**Open Source Automation**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...