# Grouped means (or anything else…)

June 25, 2012
By

(This article was first published on Insights of a PhD student » R, and kindly contributed to R-bloggers)

An easy one today, but something that stumped me for a while* the first time I tried it out.

How do you get a group mean (or other summary statistic) from R? Lets say you have a Y variable that represents repetitions for each of however many factors.

You could subset the data by each combination of the X variables. Something like

```trt1alt1 <- mean(data\$Y[data\$trt==1&data\$alt==1,])
trt1alt2 <- mean(data\$Y[data\$trt==1&data\$alt==2,])
trt1alt3 <- mean(data\$Y[data\$trt==1&data\$alt==3,])
trt2alt1 <- mean(data\$Y[data\$trt==2&data\$alt==1,])
trt2alt2 <- mean(data\$Y[data\$trt==2&data\$alt==2,])
trt2alt3 <- mean(data\$Y[data\$trt==2&data\$alt==3,])
...```

would do the trick. But thats daft. For one thing it takes a long time to type or edit, especially if you have a lot of groups.

The better way it to use aggregate.

aggregate(data\$Y, by = list(trt = data\$trt, alt = data\$alt), FUN=mean)

this outputs a table with the levels of the variables and the Y variable. No fuss, no bother. If you need to include other arguments to mean, such as its na.rm argument, thats possible too…

aggregate(data\$Y, by = list(trt = data\$trt, alt = data\$alt), FUN=mean, na.rm=TRUE)

Aggregate can also be applied to other functions, custom built or otherwise. There are also other options, such as the data.table or  ddply packages. Some of the apply functions can also do the simple single level stuff too.

* I say a while….I mean an hour or so…so not long at all.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...