# Calculate Leave-One-Out Prediction for GLM

December 13, 2015
By

(This article was first published on Yet Another Blog in Statistical Computing » S+/R, and kindly contributed to R-bloggers)

In the model development, the “leave-one-out” prediction is a way of cross-validation, calculated as below:
1. First of all, after a model is developed, each observation used in the model development is removed in turn and then the model is refitted with the remaining observations
2. The out-of-sample prediction for the refitted model is calculated with the removed observation one by one to assemble the LOO, e.g. leave-one-out predicted values for the whole model development sample.
The loo_predict() function below is a general routine to calculate the LOO prediction for any GLM object, which can be further employed to investigate the model stability and predictability.

```> pkgs <- c('doParallel', 'foreach')
> lapply(pkgs, require, character.only = T)
[]
 TRUE

[]
 TRUE

> registerDoParallel(cores = 8)
>
> data(AutoCollision, package = "insuranceData")
> # A GAMMA GLM #
> model1 <- glm(Severity ~ Age + Vehicle_Use, data = AutoCollision, family = Gamma(link = "log"))
> # A POISSON GLM #
> model2 <- glm(Claim_Count ~ Age + Vehicle_Use, data = AutoCollision, family = poisson(link = "log"))
>
> loo_predict <- function(obj) {
+   yhat <- foreach(i = 1:nrow(obj\$data), .combine = rbind) %dopar% {
+     predict(update(obj, data = obj\$data[-i, ]), obj\$data[i,], type = "response")
+   }
+   return(data.frame(result = yhat[, 1], row.names = NULL))
+ }
> # TEST CASE 1
> test1 <- loo_predict(model1)
> test1\$result
 303.7393 328.7292 422.6610 375.5023 240.9785 227.6365 288.4404 446.5589
 213.9368 244.7808 278.7786 443.2256 213.9262 243.2495 266.9166 409.2565
 175.0334 172.0683 197.2911 326.5685 187.2529 215.9931 249.9765 349.3873
 190.1174 218.6321 243.7073 359.9631 192.3655 215.5986 233.1570 348.2781
> # TEST CASE 2
> test2 <- loo_predict(model2)
> test2\$result
  11.15897  37.67273  28.76127  11.54825  50.26364 152.35489 122.23782
  44.57048 129.58158 465.84173 260.48114 107.23832 167.40672 510.41127
 316.50765 121.75804 172.56928 546.25390 341.03826 134.04303 359.30141
 977.29107 641.69934 251.32547 248.79229 684.86851 574.13994 238.42209
 148.77733 504.12221 422.75047 167.61203

```  R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...