The R apply function – a tutorial with examples

July 2, 2011
By

(This article was first published on Musings of a forgetful functor, and kindly contributed to R-bloggers)

Today I had one of those special moments that is uniquely associated with R. One of my colleagues was trying to solve what I term an 'Excel problem'. That is, one where the problem magically disappears once a programming language is employed. Put simply, the problem was to take a range, and randomly shift the elements of the list in order. For example, 12345 could become 34512 or 51234.

The list in question had forty-thousand elements, and this process needed to be repeated numerous times as part of a simulation. Try doing this in Excel and you will go insane: the shift function is doable but resource intensive. After ten minutes of waiting for your VBA script to run you will be begging for mercy or access to a supercomputer. However, in R the same can be achieved with the function:
translate<-function(x){
if (length(x)!=1){
r<-sample(1:(length(x)),1)
x<-append(x[r:length(x)],x[1:r-1])
}
return(x)
}
My colleague ran this function against his results several thousand times and had the pleasure of seeing his results spit out in less than thirty seconds: problem solved. Ain't R grand.

More R magic courtesy of the apply function
The translate function above is not rocket science, but it does demonstrate how powerful a few lines of R can be. This is best exemplified by the incredible functionality offered by the apply function. However, I have noticed that this tool is often under-utilised by less experienced R users.

The usage from the R Documenation is as follows:
apply(X, MARGIN, FUN, ...)

where:
  • X is an array or matrix;
  • MARGIN is a variable that determines whether the function is applied over rows (MARGIN=1), columns (MARGIN=2), or both (MARGIN=c(1,2));
  • FUN is the function to be applied.
In essence, the apply function allows us to make entry-by-entry changes to data frames and matrices. If MARGIN=1, the function accepts each row of X as a vector argument, and returns a vector of the results. Similarly, if MARGIN=2 the function acts on the  columns of X. Most impressively,  when MARGIN=c(1,2) the function is applied to every entry of X. As for the FUN argument, this can be anything from a standard R function, such as sum or mean, to a custom function like translate above.

An illustrative example
Consider the code below:
# Create the matrix
m<-matrix(c(seq(from=-98,to=100,by=2)),nrow=10,ncol=10)

# Return the product of each of the rows
apply(m,1,prod)

# Return the sum of each of the columns
apply(m,2,sum)

# Return a new matrix whose entries are those of 'm' modulo 10
apply(m,c(1,2),function(x) x%%10) 

In the last example, we apply a custom function to every entry of the matrix. Without this functionality, we would be at something of a disadvantage using R versus that old stalwart of the analyst: Excel. But with the apply function we can edit every entry of a data frame with a single line command. No autofilling, no wasted CPU cycles.

In the next edition of this blog, I will return to looking at R's plotting capabilities with a focus on the ggplot2 package. In the meantime, enjoy using the apply function and all it has to offer.

To leave a comment for the author, please follow the link and comment on his blog: Musings of a forgetful functor.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.