Summary Statistics With Aggregate()

[This article was first published on R-exercises, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Addition_tricky_4.svgThe aggregate() function subsets dataframes, and time series data, then computes summary statistics. The structure of the aggregate() function is aggregate(x, by, FUN).

Answers to the exercises are available here.

Exercise 1
Aggregate the “airquality” data by “airquality$Month“, returning means on each of the numeric variables. Also, remove “NA” values.

Exercise 2
Aggregate the “airquality” data by the variable “Day“, remove “NA” values, and return means on each of the numeric variables.

Exercise 3
Aggregate “airquality$Solar.R” by “Month“, returning means of “Solar.R“. The header of column 1 should be “Month“. Remove “not available” values.

Exercise 4
Apply the standard deviation function to the data aggregation from Exercise 3.

Exercise 5
The structure of the aggregate() formula interface is aggregate(formula, data, FUN).

The structure of the formula is y ~ x. The “y” variables are numeric data. The “x” variables, usually factors, are grouping variables, that subset the “y” variables.

aggregate.formula allows for one-to-one, one-to-many, many-to-one, and many-to-many aggregation.

Therefore, use aggregate.formula for a one-to-one aggregation of “airquality” by the mean of “Ozone” to the grouping variable “Day“.

Exercise 6
Use aggregate.formula for a many-to-one aggregation of “airquality” by the mean of “Solar.R” and “Ozone” by grouping variable, “Month“.

Exercise 7
Dot notation can replace the “y” or “x” variables in aggregate.formula. Therefore, use “.” dot notation to find the means of the numeric variables in airquality“, with the grouping variable of “Month“.

Exercise 8
Use dot notation to find the means of the “airquality” variables, with the grouping variables of “Day” and “Month“. Display only the first 6 resulting observations.

Exercise 9
Use dot notation to find the means of “Temp“, with the remaining “airquality” variables as grouping variables.

Exercise 10
aggregate.ts is the time series method for aggregate().

Using R‘s built-in time series dataset, “AirPassengers“, compute the average annual standard deviation.

Image by Averater (Own work) [CC BY-SA 3.0], via Wikimedia Commons.

To leave a comment for the author, please follow the link and comment on their blog: R-exercises.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)