# Formal ways to compare forecasting models: Rolling windows

November 8, 2017
By

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

By Gabriel Vasconcelos

## Overview

When working with time-series forecasting we often have to choose between a few potential models and the best way is to test each model in pseudo-out-of-sample estimations. In other words, we simulate a forecasting situation where we drop some data from the estimation sample to see how each model perform.

Naturally, if you do only one (or just a few) forecasting test you results will have no robustness and in the next forecast the results may change drastically. Another possibility is to estimate the model in, let’s say, half of the sample, and use the estimated model to forecast the other half. This is better than a single forecast but it does not account for possible changes in the structure of the data over the time because you have only one estimation of the model. The most accurate way to compare models is using rolling windows. Suppose you have, for example, 200 observations of a time-series. First you estimate the model with the first 100 observations to forecast the observation 101. Then you include the observation 101 in the estimation sample and estimate the model again to forecast the observation 102. The process is repeated until you have a forecast for all 100 out-of-sample observations. This procedure is also called expanding window. If you drop the first observation in each iteration to keep the window size always the same then you have a fixed rolling window estimation. In the end you will have 100 forecasts for each model and you can calculate RMSE, MAE and formal tests such as Diebold & Mariano.

In general, the fixed rolling window is better than the expanding window because of the following example. Suppose we have two models:

$\displaystyle y_t = b_1x_{1,t-1}+\varepsilon_t$
$y_t = b_1x_{1,t-1}+b_2x_{2,t-1}+\varepsilon_t$

Let’s assume that the true value of $b_2$ is zero. If we use expanding windows the asymptotic theory tells us that $\hat{b}_2$ will go to zero and both models will be the same. If that is the case, we may be unable to distinguish which model is more accurate to forecast $y_t$. However, the first model is better than the second model in small samples and it is just as good in large samples. We should be able to identify this feature. Fixed rolling windows keep the sample size fixed and they are free from this problem conditional on the sample size. In this case, the Diebold & Mariano test becomes the Giacomini & White test.

## Application

In this example we are going to use some inflation data from the AER package. First let’s have a look at the function embed. This function is very useful in this rolling window framework because we often include lags of variables in the models and the function embed creates all lags for us in a single line of code. Here is an example:

library(AER)
library(xts)
library(foreach)
library(reshape2)
library(ggplot2)
library(forecast)

## = embed = ##
x1 = c(1, 2, 3, 4, 5)
x2 = c(11, 12, 13, 14, 15)
x = cbind(x1, x2)
(x_embed = embed(x, 2))

##      [,1] [,2] [,3] [,4]
## [1,]    2   12    1   11
## [2,]    3   13    2   12
## [3,]    4   14    3   13
## [4,]    5   15    4   14


As you can see. The first two columns show the variables x1 and x2 at lag 0 and the second column shows the same variables with one lag. We lost one observation because of the lag operation.

To the real example!!! We are going to estimate a model to forecast the US inflation using four autorregressive variables (four lags of the inflation), four lags of the industrial production and dummy variables for months. The second model will be a simple random walk. I took the first log-difference on both variables (CPI and industrial production index). The code below loads and prepare the data with the embed function.

## = Load Data = ##
data("USMacroSWM")
data = as.xts(USMacroSWM)[ , c("cpi", "production"), ]
data = cbind(diff(log(data[ ,"cpi"])), diff(log(data[ ,"production"])))[-1, ]

## = Prep data with embed = ##
lag = 4
X = embed(data, lag + 1)
X = as.data.frame(X)
colnames(X) = paste(rep(c("inf", "prod"), lag + 1),
sort(rep(paste("l", 0:lag, sep = ""),2)), sep = "" )
X$month = months(tail(index(data), nrow(X))) head(X)  ## infl0 prodl0 infl1 prodl1 infl2 ## 1 0.005905082 0.000000000 -0.002275314 0.004085211 0.000000000 ## 2 0.006770507 -0.005841138 0.005905082 0.000000000 -0.002275314 ## 3 0.007618231 0.005841138 0.006770507 -0.005841138 0.005905082 ## 4 0.019452426 0.007542827 0.007618231 0.005841138 0.006770507 ## 5 0.003060112 0.009778623 0.019452426 0.007542827 0.007618231 ## 6 0.006526018 0.013644328 0.003060112 0.009778623 0.019452426 ## prodl2 infl3 prodl3 infl4 prodl4 ## 1 -0.008153802 0.017423641 0.005817352 0.006496543 0.005851392 ## 2 0.004085211 0.000000000 -0.008153802 0.017423641 0.005817352 ## 3 0.000000000 -0.002275314 0.004085211 0.000000000 -0.008153802 ## 4 -0.005841138 0.005905082 0.000000000 -0.002275314 0.004085211 ## 5 0.005841138 0.006770507 -0.005841138 0.005905082 0.000000000 ## 6 0.007542827 0.007618231 0.005841138 0.006770507 -0.005841138 ## month ## 1 June ## 2 July ## 3 August ## 4 September ## 5 October ## 6 November  The following code estimates 391 fixed rolling windows with a sample size of 300 in each window: # = Number of windows and window size w_size = 300 n_windows = nrow(X) - 300 # = Rolling Window Loop = # forecasts = foreach(i=1:n_windows, .combine = rbind) %do%{ # = Select data for the window (in and out-of-sample) = # X_in = X[i:(w_size + i - 1), ] # = change to X[1:(w_size + i - 1), ] for expanding window X_out = X[w_size + i, ] # = Regression Model = # m1 = lm(infl0 ~ . - prodl0, data = X_in) f1 = predict(m1, X_out) # = Random Walk = # f2 = tail(X_in$infl0, 1)

return(c(f1, f2))
}


Finally, the remaining code calculates the forecasting errors, forecasting RMSE across the rolling windows and the Giacomini & White test. As you can see, the test rejected the null hypothesis of both models being equally accurate and the RMSE was smaller for the model with the lags, production and dummies.

# = Calculate and plot errors = #
e1 = tail(X[ ,"infl0"], nrow(forecasts)) - forecasts[ ,1]
e2 = tail(X[ ,"infl0"], nrow(forecasts)) - forecasts[ ,2]
df = data.frame("date"=tail(as.Date(index(data)), n_windows), "Regression" = e1, "RandomWalk" = e2)
mdf = melt(df,id.vars = "date")
ggplot(data = mdf) + geom_line(aes(x = date, y = value, linetype = variable, color = variable))


# = RMSE = #
(rmse1 = 1000 * sqrt(mean(e1 ^ 2)))

## [1] 2.400037

(rmse2 = 1000 * sqrt(mean(e2 ^ 2)))

## [1] 2.62445

# = DM test = #
(dm = dm.test(e1, e2))

##
##  Diebold-Mariano Test
##
## data:  e1e2
## DM = -1.977, Forecast horizon = 1, Loss function power = 2,
## p-value = 0.04874
## alternative hypothesis: two.sided


## References

Diebold, Francis X., and Robert S. Mariano. “Comparing predictive accuracy.” Journal of Business & economic statistics 20.1 (2002): 134-144.

Giacomini, Raffaella, and Halbert White. “Tests of conditional predictive ability.” Econometrica 74.6 (2006): 1545-1578.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.