Calculate Leave-One-Out Prediction for GLM

December 13, 2015
By

(This article was first published on Yet Another Blog in Statistical Computing » S+/R, and kindly contributed to R-bloggers)

In the model development, the “leave-one-out” prediction is a way of cross-validation, calculated as below:
1. First of all, after a model is developed, each observation used in the model development is removed in turn and then the model is refitted with the remaining observations
2. The out-of-sample prediction for the refitted model is calculated with the removed observation one by one to assemble the LOO, e.g. leave-one-out predicted values for the whole model development sample.
The loo_predict() function below is a general routine to calculate the LOO prediction for any GLM object, which can be further employed to investigate the model stability and predictability.

> pkgs <- c('doParallel', 'foreach')
> lapply(pkgs, require, character.only = T)
[[1]]
[1] TRUE

[[2]]
[1] TRUE

> registerDoParallel(cores = 8)
>
> data(AutoCollision, package = "insuranceData")
> # A GAMMA GLM #
> model1 <- glm(Severity ~ Age + Vehicle_Use, data = AutoCollision, family = Gamma(link = "log"))
> # A POISSON GLM #
> model2 <- glm(Claim_Count ~ Age + Vehicle_Use, data = AutoCollision, family = poisson(link = "log"))
>
> loo_predict <- function(obj) {
+   yhat <- foreach(i = 1:nrow(obj$data), .combine = rbind) %dopar% {
+     predict(update(obj, data = obj$data[-i, ]), obj$data[i,], type = "response")
+   }
+   return(data.frame(result = yhat[, 1], row.names = NULL))
+ }
> # TEST CASE 1
> test1 <- loo_predict(model1)
> test1$result
 [1] 303.7393 328.7292 422.6610 375.5023 240.9785 227.6365 288.4404 446.5589
 [9] 213.9368 244.7808 278.7786 443.2256 213.9262 243.2495 266.9166 409.2565
[17] 175.0334 172.0683 197.2911 326.5685 187.2529 215.9931 249.9765 349.3873
[25] 190.1174 218.6321 243.7073 359.9631 192.3655 215.5986 233.1570 348.2781
> # TEST CASE 2
> test2 <- loo_predict(model2)
> test2$result
 [1]  11.15897  37.67273  28.76127  11.54825  50.26364 152.35489 122.23782
 [8]  44.57048 129.58158 465.84173 260.48114 107.23832 167.40672 510.41127
[15] 316.50765 121.75804 172.56928 546.25390 341.03826 134.04303 359.30141
[22] 977.29107 641.69934 251.32547 248.79229 684.86851 574.13994 238.42209
[29] 148.77733 504.12221 422.75047 167.61203

To leave a comment for the author, please follow the link and comment on their blog: Yet Another Blog in Statistical Computing » S+/R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)