Summary Statistics With Aggregate()

June 16, 2016

(This article was first published on R-exercises, and kindly contributed to R-bloggers)

Addition_tricky_4.svgThe aggregate() function subsets dataframes, and time series data, then computes summary statistics. The structure of the aggregate() function is aggregate(x, by, FUN).

Answers to the exercises are available here.

Exercise 1
Aggregate the “airquality” data by “airquality$Month“, returning means on each of the numeric variables. Also, remove “NA” values.

Exercise 2
Aggregate the “airquality” data by the variable “Day“, remove “NA” values, and return means on each of the numeric variables.

Exercise 3
Aggregate “airquality$Solar.R” by “Month“, returning means of “Solar.R“. The header of column 1 should be “Month“. Remove “not available” values.

Exercise 4
Apply the standard deviation function to the data aggregation from Exercise 3.

Exercise 5
The structure of the aggregate() formula interface is aggregate(formula, data, FUN).

The structure of the formula is y ~ x. The “y” variables are numeric data. The “x” variables, usually factors, are grouping variables, that subset the “y” variables.

aggregate.formula allows for one-to-one, one-to-many, many-to-one, and many-to-many aggregation.

Therefore, use aggregate.formula for a one-to-one aggregation of “airquality” by the mean of “Ozone” to the grouping variable “Day“.

Exercise 6
Use aggregate.formula for a many-to-one aggregation of “airquality” by the mean of “Solar.R” and “Ozone” by grouping variable, “Month“.

Exercise 7
Dot notation can replace the “y” or “x” variables in aggregate.formula. Therefore, use “.” dot notation to find the means of the numeric variables in airquality“, with the grouping variable of “Month“.

Exercise 8
Use dot notation to find the means of the “airquality” variables, with the grouping variables of “Day” and “Month“. Display only the first 6 resulting observations.

Exercise 9
Use dot notation to find the means of “Temp“, with the remaining “airquality” variables as grouping variables.

Exercise 10
aggregate.ts is the time series method for aggregate().

Using R‘s built-in time series dataset, “AirPassengers“, compute the average annual standard deviation.

Image by Averater (Own work) [CC BY-SA 3.0], via Wikimedia Commons.

To leave a comment for the author, please follow the link and comment on their blog: R-exercises. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)