**Insights of a PhD student » R**, and kindly contributed to R-bloggers)

An easy one today, but something that stumped me for a while* the first time I tried it out.

How do you get a group mean (or other summary statistic) from R? Lets say you have a Y variable that represents repetitions for each of however many factors.

You could subset the data by each combination of the X variables. Something like

trt1alt1 <- mean(data$Y[data$trt==1&data$alt==1,]) trt1alt2 <- mean(data$Y[data$trt==1&data$alt==2,]) trt1alt3 <- mean(data$Y[data$trt==1&data$alt==3,]) trt2alt1 <- mean(data$Y[data$trt==2&data$alt==1,]) trt2alt2 <- mean(data$Y[data$trt==2&data$alt==2,]) trt2alt3 <- mean(data$Y[data$trt==2&data$alt==3,]) ...

would do the trick. But thats daft. For one thing it takes a long time to type or edit, especially if you have a lot of groups.

The better way it to use aggregate.

aggregate(data$Y, by = list(trt = data$trt, alt = data$alt), FUN=mean)

this outputs a table with the levels of the variables and the Y variable. No fuss, no bother. If you need to include other arguments to mean, such as its na.rm argument, thats possible too…

aggregate(data$Y, by = list(trt = data$trt, alt = data$alt), FUN=mean, na.rm=TRUE)

Aggregate can also be applied to other functions, custom built or otherwise. There are also other options, such as the data.table or ddply packages. Some of the apply functions can also do the simple single level stuff too.

* I say a while….I mean an hour or so…so not long at all.

**leave a comment**for the author, please follow the link and comment on their blog:

**Insights of a PhD student » R**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...