mapply and by functions in R

[This article was first published on R – StudyTrails, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In the previous tutorial we looked at the apply group of functions. In this example we look at mapply and by functions.

mapply
Its a bit difficult to explain the mapply function in words so we directly jump into an example and provide a definition later on.

> mapply(function(x,y){x^y},x=c(2,3),y=c(3,4))
[1]  8 81

Requires explanation, doesn’t it? So here’s how it goes – the first argument is the function FUN. It takes in two parameters x and y. The values of x come from the second argument (x=c(2,3)) and the values of y come from the 3rd argument (y=c(3,4)). x and y both have two values so the function is called twice. The first time its called with the first values of x and y (x=2 and y =3 which gives 8). The second time its called with the second values of x and y (x=3 and y=4 which gives 81)

Definition of mapply function

As promised, here is the formal definition – mapply can be used to call a function FUN over vectors or lists one index at a time. In other words the function is first called over elements at index 1 of all vectors or list, its then called over all elements at index 2 and so on.

The arguments x and y are recycled if they are of different lengths. (however they have to be either all 0 or all non zero)

# the values in y are recycled. 
# i.e. for both the values in x the same value (4) of y is used.
> mapply(function(x,y){x^y},x=c(2,3),y=c(4))
[1] 16 81

You can’t do this though

> mapply(function(x,y){x^y},x=c(2,3,6),y=c())
Error in mapply(function(x, y) { : 
  zero-length inputs cannot be mixed with those of non-zero length

Its not necessary to specify names

> mapply(function(x,y){x^y},c(2,3),c(3,4))
[1]  8 81

We can give names to each index. The names from the first argument is used.

> mapply(function(x,y){x^y},c(a=2,b=3),c(A=3,B=4))
 a  b 
 8 81

unless you specifically ask R to not use names

 > mapply(function(x,y){x^y},c(a=2,b=3),c(A=3,B=4),USE.NAMES=FALSE)
[1]  8 81

If the function needs more arguments that remain same for all the iterations of FUN then use “MoreArgs” argument

 > mapply(function(x,y,z,k){(x+k)^(y+z)},c(a=2,b=3),c(A=3,B=4),MoreArgs=list(1,2))
   a    b 
 256 3125 

The values z and k are 1 and 2 respectively. So the first evaluation of function gives (2+2)^(3+1) and the second gives (3+2)^(4+1)

As with the other apply functions you can use Simplify to reduce the result to a vector, matrix or array

by

The by function is similar to apply function but is used to apply functions over data frame or matrix. We first create a data frame for this example.

# the data frame df contains two columns a and b
> df=data.frame(a=c(1:15),b=c(1,1,2,2,2,2,3,4,4,4,5,5,6,7,7))

We use the by function to get sum of all values of a grouped by values of b. That is, sum of all values of a where b=1, sum of all values of a where b is 2 and so on.

> by(df,factor(df$b),sum)

The by function takes 3 variables. The first is the data frame. The second is the factors over which the function has to be applied. The length of this argument should be same as the length of the data frame. The third is the actual function. This is what it produces

factor(df$b): 1
[1] 5
------------------------------------------------------------ 
factor(df$b): 2
[1] 26
------------------------------------------------------------ 
factor(df$b): 3
[1] 10
------------------------------------------------------------ 
factor(df$b): 4
[1] 39
------------------------------------------------------------ 
factor(df$b): 5
[1] 33
------------------------------------------------------------ 
factor(df$b): 6
[1] 19
------------------------------------------------------------ 
factor(df$b): 7
[1] 43

Even if the data frame has multiple columns the function works well.

> df=data.frame(a=c(1:15),k=c(1:15),b=c(1,1,2,2,2,2,3,4,4,4,5,5,6,7,7))
> by(df,factor(df$b),sum)
factor(df$b): 1
[1] 8
------------------------------------------------------------ 
factor(df$b): 2
[1] 44
------------------------------------------------------------ 
..... [truncated]

The post mapply and by functions in R appeared first on StudyTrails.

To leave a comment for the author, please follow the link and comment on their blog: R – StudyTrails.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)