Chunk Averaging of GLM

July 7, 2019

(This article was first published on S+/R – Yet Another Blog in Statistical Computing, and kindly contributed to R-bloggers)

Chunk Average (CA) is an interesting concept proposed by Matloff in the chapter 13 of his book “Parallel Computing for Data Science”. The basic idea is to partition the entire model estimation sample into chunks and then to estimate a glm for each chunk. Under the i.i.d assumption, the CA estimator with the chunked data is asymptotically equivalent to the estimator with the full data. The possibility of converting the full model estimation with an excessively large dataset to the chunked estimation with small pieces is particularly attractive in real-world model developments where the model convergence could be challenging given the data size.

The ca_glm() function below is my attempt to implement the Chunk Averaging of GLM. As shown, CA estimations by various chunks are consistent with the estimation with the full data.

To leave a comment for the author, please follow the link and comment on their blog: S+/R – Yet Another Blog in Statistical Computing. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)