The leave-one-out cross-validation statistic is given by
where , are the observations, and is the predicted value obtained when the model is estimated with the th case deleted. This is also sometimes known as the PRESS (Prediction Residual Sum of Squares) statistic.
It turns out that for linear models, we do not actually have to estimate the model times, once for each omitted case. Instead, CV can be computed after estimating the model once on the complete data set.
Suppose we have a linear regression model . Then and is the “hat-matrix”. It has this name because it is used to compute . If the diagonal values of are denoted by , then the leave-one-out cross-validation statistic can be computed using
where and is the predicted value obtained when the model is estimated with all data included. This is a remarkable result, and is given without proof in Section 5.5 of my forecasting textbook.
I am teaching my second year forecasting class about this result tomorrow, and I thought my students might like to see the proof. What follows is the simplest proof I know (adapted from Seber and Lee, 2003).
Let and be similar to and but with the th row deleted in each case. Let be the th row of and let
be the estimate of without the th case. Then .
Now and . So by the Sherman–Morrison–Woodbury formula,
Also note that . Therefore
and the result follows.
This result is implemented in the
CV() function from the forecast package for R.