Prediction intervals too narrow

October 21, 2014

(This article was first published on Hyndsight » R, and kindly contributed to R-bloggers)

Almost all prediction intervals from time series models are too narrow. This is a well-known phenomenon and arises because they do not account for all sources of uncertainty. In my 2002 IJF paper, we measured the size of the problem by computing the actual coverage percentage of the prediction intervals on hold-out samples. We found that for ETS models, nominal 95% intervals may only provide coverage between 71% and 87%. The difference is due to missing sources of uncertainty.

There are at least four sources of uncertainty in forecasting using time series models:

  1. The random error term;
  2. The parameter estimates;
  3. The choice of model for the historical data;
  4. The continuation of the historical data generating process into the future.

When we produce prediction intervals for time series models, we generally only take into account the first of these sources of uncertainty. It would be possible to account for 2 and 3 using simulations, but that is almost never done because it would take too much time to compute. As computing speeds increase, it might become a viable approach in the future.

Even if we ignore the model uncertainty and the DGP uncertainty (sources 3 and 4), and just try to allow for parameter uncertainty as well as the random error term (sources 1 and 2), there are no closed form solutions apart from some simple special cases.

One such special case is an ARIMA(0,1,0) model with drift, which can be written as

    \[y_t = y_{t-1} + c + e_t,\]

where e_t is a white noise process. In this case, it is easy to compute the uncertainty associated with the estimate of c, and then allow for it in the forecasts.

This model can be fitted using either the Arima function or the rwf function from the forecast package for R. If the Arima function is used, the uncertainty in c is ignored, but if the rwf function is used, the uncertainty in c is included in the prediction intervals. The difference can be seen in the following simulated example.

x <-ts(cumsum(rnorm(50, -2.5, 4)))
RWD.x <- rwf(x,  h=40, drift=TRUE, level=95)
ARIMA.x <- Arima(x, c(0,1,0), include.drift=TRUE)
plot(forecast(ARIMA.x, h=40, level=95))
lines(RWD.x$lower, lty=2)
lines(RWD.x$upper, lty=2)


To leave a comment for the author, please follow the link and comment on their blog: Hyndsight » R. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)