(This article was first published on

**Fabio Marroni's Blog » R**, and kindly contributed to R-bloggers)I imagine that the same result can be achieved by a proper use of quantile, but I like to have an easy way to obtain summary statistics every n entries of my dataset be it a vector or data.frame.

The function takes three parameters: the R object on which we need to obtain statistics (x), how many entries should each summary contain (step, defaulting to 1000), and the function we want to apply (fun, defaulting to “mean”).

Then, it’s all about using aggregate.

summarize.by<-function(x,step=1000,fun="mean") { if(is.data.frame(x)) { group<-sort(rep(seq(1,ceiling(nrow(x)/step)),step)[1:nrow(x)]) } if(is.vector(x)) { group<-sort(rep(seq(1,ceiling(length(x)/step)),step)[1:length(x)]) } x<-data.frame(group,x) x<-aggregate(x,by=list(x$group),FUN=fun) x<-x[,-c(1,2)] return(x) }

Example application and result for a data.frame:

dummy<-data.frame(matrix(runif(100000,0,1),ncol=10)) summarize.by(dummy) X1 X2 X3 X4 X5 X6 X7 1 0.5081756 0.5206011 0.4972622 0.5060707 0.4907807 0.5063138 0.4982252 2 0.5014300 0.5093051 0.5015310 0.4718058 0.4931249 0.4882382 0.5084970 3 0.4994759 0.4979546 0.4964157 0.5138695 0.5018427 0.5228862 0.4980824 4 0.4970300 0.4953163 0.4954068 0.5157935 0.4770471 0.5000562 0.4960250 5 0.5118221 0.4967686 0.5114420 0.4945936 0.5016019 0.5003544 0.5016693 6 0.5026323 0.4995367 0.5003587 0.4970245 0.4992188 0.4993896 0.4873300 7 0.4911944 0.5081578 0.4858666 0.4974576 0.4864710 0.5022401 0.5058064 8 0.5050684 0.5021456 0.4970707 0.4829222 0.4980984 0.4901941 0.5053296 9 0.4910359 0.4883865 0.4915000 0.4984415 0.4941274 0.4933778 0.4964306 10 0.4832396 0.4986647 0.5017873 0.5008766 0.4952849 0.5036030 0.5084799 X8 X9 X10 1 0.5052379 0.4906292 0.4916262 2 0.5074966 0.5117570 0.5183119 3 0.4988349 0.5029704 0.5077726 4 0.4889516 0.5066026 0.5078195 5 0.5068717 0.4988389 0.5018225 6 0.5010366 0.4870614 0.4827767 7 0.5148197 0.5083662 0.5037901 8 0.4979452 0.5273463 0.4944513 9 0.5130718 0.5061075 0.5058208 10 0.4896030 0.4911127 0.4956848

And for a vector

dummy<-runif(10000,0,1) summarize.by(dummy) [1] 0.4914789 0.4908839 0.4951939 0.4928015 0.4911908 0.4994735 0.4947729 [8] 0.5058204 0.5026956 0.5018375

To

**leave a comment**for the author, please follow the link and comment on their blog:**Fabio Marroni's Blog » R**.R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...