# ave and the [ function in R

September 10, 2013
By

(This article was first published on mages' blog, and kindly contributed to R-bloggers)

The `ave` function in R is one of those little helper function I feel I should be using more. Investigating its source code showed me another twist about R and the “[” function. But first let’s look at `ave`.

The top of `ave`‘s help page reads:

Group Averages Over Level Combinations of Factors

Subsets of x[] are averaged, where each subset consist of those observations with the same factor levels.

As an example I look at revenue data by product and shop.

`revenue <- c(30,20, 23, 17)product <- factor(c("bread", "cake", "bread", "cake"))shop <- gl(2,2, labels=c("shop_1", "shop_2"))`

To answer the question “Which shop sells proportionally more bread?” I need to divide the revenue vector by the sum of revenue per shop, which can be calculated easily by `ave`:

``(shop_revenue <- ave(revenue, shop, FUN=sum))# [1] 50 50 40 40(revenue_split_in_shop <- revenue/shop_revenue)# [1] 0.600 0.400 0.575 0.425 # Shop 1 sells more bread than cake``

In other words, `ave` has to split the revenue vector by shop and apply the `sum` function to it. Well that’s exactly what it does. Here is the source code of `ave`:

``#  Copyright (C) 1995-2012 The R Core Teamave <- function (x, ..., FUN = mean){    if(missing(...)) x[] <- FUN(x)    else { g <- interaction(...) split(x,g) <- lapply(split(x, g), FUN)    }    x}``

However, and this is what intrigued me, if I don’t provide a grouping variable (`missing(...)`) it will apply the function `FUN` on `x` itself and write its output to `x[]`. That’s actually what the help file to `ave` mentioned in its description. So what does it do? Here is an example again:

``ave(revenue, FUN=sum)# [1] 90 90 90 90``

I get the sum of revenue repeated as many time as the vector has elements, not just once, as with `sum(revenue)`. The trick is that the output of `FUN(x)` is written into `x[]`, which of course is output of a function call itself “[“(x).

I think it is the following sentence in the help file of `"["` (see ?”[“), which explains it: Subsetting (except by an empty index) will drop all attributes except names, dim and dimnames.

So there we are. I feel less inclined to use `ave` more, as it is just short for the usual `split, lapply` routine, but I learned something new about the subtleties of R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...