A very typical task in data analysis is calculation of summary statistics for each variable in data frame. Standard lapply or sapply functions work very nice for this but operate only on single function. The problem is that I often want to calculate *several* diffrent statistics of the data. For example assume that we want to calculate minimum, maximum and mean value of each variable in data frame.

The simplest solution for this is to write a function that does all the calculations and returns a vector. The sample code is:

multi.fun **<-** **function****(**x**)** **{**

c**(**min **=** min**(**x**)**, mean **=** mean**(**x**)**, max **=** max**(**x**))**

**}**

It gives the following result for cars data set:

> sapply**(**cars, multi.fun**)**

speed dist

min 4.0 2.00

mean 15.4 42.98

max 25.0 120.00

However, when I work in interactive mode I would prefer to have a function that would accept *multiple* functions as arguments. I came up with the following solution to this problem:

multi.sapply **<-** **function****(**…**)** **{**

arglist **<-** match.call**(**expand.dots **=** **FALSE****)$**…

var.names **<-** sapply**(**arglist, deparse**)**

has.name **<-** **(**names**(**arglist**)** **!=** “”**)**

var.names**[**has.name**]** **<-** names**(**arglist**)[**has.name**]**

arglist **<-** lapply**(**arglist, eval.parent, n **=** 2**)**

x **<-** arglist**[[**1**]]**

arglist**[[**1**]]** **<-** **NULL**

result **<-** sapply**(**arglist, **function** **(**FUN, x**)** sapply**(**x, FUN**)**, x**)**

colnames**(**result**)** **<-** var.names**[-**1**]**

return**(**result**)**

**}**

My multi.sapply function takes a vector as first argument and next one can specify multiple functions that are to be applied to this vector. Applying it to cars data yields:

> multi.sapply**(**cars, min, mean, max**)**

min mean max

speed 4 15.40 25

dist 2 42.98 120

If function argument is given name it will be used as column name instead of deparsed expression. This functionality is shown by the following example summarizing several statistics of EuStockMarkets data set:

> log.returns **<-** data.frame**(**diff**(**log**(**EuStockMarkets**)))**

> multi.sapply**(**log.returns, sd, min,

> VaR10 **=** **function****(**x**)** quantile**(**x, 0.1**))**

sd min VaR10

DAX 0.010300837 -0.09627702 -0.010862458

SMI 0.009250036 -0.08382500 -0.009696908

CAC 0.011030875 -0.07575318 -0.012354424

FTSE 0.007957728 -0.04139903 -0.009139666

*Related*

To

**leave a comment** for the author, please follow the link and comment on their blog:

** R snippets**.

R-bloggers.com offers

**daily e-mail updates** about

R news and

tutorials on topics such as:

Data science,

Big Data, R jobs, visualization (

ggplot2,

Boxplots,

maps,

animation), programming (

RStudio,

Sweave,

LaTeX,

SQL,

Eclipse,

git,

hadoop,

Web Scraping) statistics (

regression,

PCA,

time series,

trading) and more...

If you got this far, why not

__subscribe for updates__ from the site? Choose your flavor:

e-mail,

twitter,

RSS, or

facebook...