apply lapply rapply sapply functions in R

March 18, 2016
By

(This article was first published on Data Perspective, and kindly contributed to R-bloggers)

As part of Data Science with R, this is third tutorial after basic data types,control structures in r.

One of the issues with for loop is its memory consumption and its slowness in executing a repetitive task at hand. Often dealing with large data and iterating it, for loop is not advised. R provides many few alternatives to be applied on vectors for looping operations. In this section, we deal with apply function and its variants:

?apply

Datasets for apply family tutorial

 For understanding the apply functions in R we use,the data from 1974 Motor Trend
US magazine which comprises fuel consumption and 10 aspects of automobile design and
performance for 32 automobiles (1973–74 models).

data("mtcars")
head(mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1


Reynolds (1994) describes a small part of a study of the long-term temperature dynamics
of beaver Castor canadensis in north-central Wisconsin. Body temperature was measured by
telemetry every 10 minutes for four females, but data from a one period of less than a
day for each of two animals is used there.


data(beavers)
head(t(beaver1)[1:4,1:10])
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
day 346.00 346.00 346.00 346.00 346.00 346.00 346.00 346.00 346.00 346.00
time 840.00 850.00 900.00 910.00 920.00 930.00 940.00 950.00 1000.00 1010.00
temp 36.33 36.34 36.35 36.42 36.55 36.69 36.71 36.75 36.81 36.88
activ 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

apply():
apply() function is the base function. We will learn how to apply family functions by trying out the code. apply() function takes 3 arguments:

  • data matrix
  • row/column operation, – 1 for row wise operation, 2 for column wise operation
  • function to be applied on the data.
 
when 1 is passed as second parameter, the function max is applied row wise and gives
us the result. In the below example, row wise maximum value is calculated.Since we
have four types of attributes we got 4 results.

apply(t(beaver1),1,max)
day time temp activ
347.00 2350.00 37.53 1.00


When 2 is passed as second parameter the function mean is applied column wise.
In the below example mean function is applied on each column and mean for each
column is calculated. Hence we can see results for each column.

apply(mtcars,2,mean)
mpg cyl disp hp drat wt qsec vs am gear carb
20.090625 6.187500 230.721875 146.687500 3.596563 3.217250 17.848750 0.437500 0.406250 3.687500 2.812500

We can also pass custom function instead of default functions. For example in
the below example let us divide each column element with modulus of 10.
For this we use a custom function which takes each element from each column and
apply the modulus operation.

head(apply(mtcars,2,function(x) x%%10))
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 1.0 6 0 0 3.90 2.620 6.46 0 1 4 4
Mazda RX4 Wag 1.0 6 0 0 3.90 2.875 7.02 0 1 4 4
Datsun 710 2.8 4 8 3 3.85 2.320 8.61 1 1 4 1
Hornet 4 Drive 1.4 6 8 0 3.08 3.215 9.44 1 0 3 1
Hornet Sportabout 8.7 8 0 5 3.15 3.440 7.02 0 0 3 2
Valiant 8.1 6 5 5 2.76 3.460 0.22 1 0 3 1


lapply():
lapply function is applied for operations on list objects and returns a list object of same length of original set.
lapply function in R, returns a list of the same length as input list object, each element of which is the result of applying FUN to the corresponding element of list.

 #create a list with 2 elements 
l = (a=1:10,b=11:20) # the mean of the value in each element
lapply(l, mean)
$a
[1] 5.5
$b
[1] 15.5
class(lapply(l, mean))
[1] "list
# the sum of the values in each element
lapply(l, sum)
$a
[1] 55

$b
[1] 155



sapply():
sapply is wrapper class to lapply with difference being it returns vector or matrix instead of list object.

 
# create a list with 2 elements

l = (a=1:10,b=11:20) # mean of values using sapply
sapply(l, mean)
a b
5.5 15.5

tapply():
tapply() is a very powerful function that lets you break a vector into pieces and then apply some function to each of the pieces. In the below code, first each of mpg in mtcars data is grouped by cylinder type and then mean() function is calculated.

str(mtcars$cyl)
num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
levels(as.factor(mtcars$cyl))
[1] "4" "6" "8"

In the dataset we have 3 types of cylinders and now we want to see the average mpg
for each cylinder type.

tapply(mtcars$mpg,mtcars$cyl,mean)
4 6 8
26.66364 19.74286 15.10000

In the output above we see that the average mpg for 4 cylinder engine
is 26.664, 6-cyinder engine is 19.74 and 8-cylinder engine is 15.10


by():
by works similar to group by function in SQL, applied to factors, where in we may apply operations on individual results set. In the below example, we apply colMeans() function to all the observations on iris dataset grouped by Species.

data(iris) 
'data.frame': 150 obs. of 5 variables:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

by(iris[,1:4],iris$Species,colMeans)
iris$Species: setosa
Sepal.Length Sepal.Width Petal.Length Petal.Width
5.006 3.428 1.462 0.246
------------------------------------------------------------------------------------
iris$Species: versicolor
Sepal.Length Sepal.Width Petal.Length Petal.Width
5.936 2.770 4.260 1.326
------------------------------------------------------------------------------------
iris$Species: virginica
Sepal.Length Sepal.Width Petal.Length Petal.Width
6.588 2.974 5.552 2.026

To leave a comment for the author, please follow the link and comment on their blog: Data Perspective.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)