Formal ways to compare forecasting models: Rolling windows

November 8, 2017
By

(This article was first published on R – insightR, and kindly contributed to R-bloggers)

By Gabriel Vasconcelos

Overview

When working with time-series forecasting we often have to choose between a few potential models and the best way is to test each model in pseudo-out-of-sample estimations. In other words, we simulate a forecasting situation where we drop some data from the estimation sample to see how each model perform.

Naturally, if you do only one (or just a few) forecasting test you results will have no robustness and in the next forecast the results may change drastically. Another possibility is to estimate the model in, let’s say, half of the sample, and use the estimated model to forecast the other half. This is better than a single forecast but it does not account for possible changes in the structure of the data over the time because you have only one estimation of the model. The most accurate way to compare models is using rolling windows. Suppose you have, for example, 200 observations of a time-series. First you estimate the model with the first 100 observations to forecast the observation 101. Then you include the observation 101 in the estimation sample and estimate the model again to forecast the observation 102. The process is repeated until you have a forecast for all 100 out-of-sample observations. This procedure is also called expanding window. If you drop the first observation in each iteration to keep the window size always the same then you have a fixed rolling window estimation. In the end you will have 100 forecasts for each model and you can calculate RMSE, MAE and formal tests such as Diebold & Mariano.

In general, the fixed rolling window is better than the expanding window because of the following example. Suppose we have two models:

\displaystyle y_t = b_1x_{1,t-1}+\varepsilon_t
y_t = b_1x_{1,t-1}+b_2x_{2,t-1}+\varepsilon_t

Let’s assume that the true value of b_2 is zero. If we use expanding windows the asymptotic theory tells us that \hat{b}_2 will go to zero and both models will be the same. If that is the case, we may be unable to distinguish which model is more accurate to forecast y_t. However, the first model is better than the second model in small samples and it is just as good in large samples. We should be able to identify this feature. Fixed rolling windows keep the sample size fixed and they are free from this problem conditional on the sample size. In this case, the Diebold & Mariano test becomes the Giacomini & White test.

Application

In this example we are going to use some inflation data from the AER package. First let’s have a look at the function embed. This function is very useful in this rolling window framework because we often include lags of variables in the models and the function embed creates all lags for us in a single line of code. Here is an example:

library(AER)
library(xts)
library(foreach)
library(reshape2)
library(ggplot2)
library(forecast)

## = embed = ##
x1 = c(1, 2, 3, 4, 5)
x2 = c(11, 12, 13, 14, 15)
x = cbind(x1, x2)
(x_embed = embed(x, 2))
##      [,1] [,2] [,3] [,4]
## [1,]    2   12    1   11
## [2,]    3   13    2   12
## [3,]    4   14    3   13
## [4,]    5   15    4   14

As you can see. The first two columns show the variables x1 and x2 at lag 0 and the second column shows the same variables with one lag. We lost one observation because of the lag operation.

To the real example!!! We are going to estimate a model to forecast the US inflation using four autorregressive variables (four lags of the inflation), four lags of the industrial production and dummy variables for months. The second model will be a simple random walk. I took the first log-difference on both variables (CPI and industrial production index). The code below loads and prepare the data with the embed function.

## = Load Data = ##
data("USMacroSWM")
data = as.xts(USMacroSWM)[ , c("cpi", "production"), ]
data = cbind(diff(log(data[ ,"cpi"])), diff(log(data[ ,"production"])))[-1, ]

## = Prep data with embed = ##
lag = 4
X = embed(data, lag + 1)
X = as.data.frame(X)
colnames(X) = paste(rep(c("inf", "prod"), lag + 1),
                    sort(rep(paste("l", 0:lag, sep = ""),2)), sep = "" )
X$month = months(tail(index(data), nrow(X)))
head(X)
##         infl0       prodl0        infl1       prodl1        infl2
## 1 0.005905082  0.000000000 -0.002275314  0.004085211  0.000000000
## 2 0.006770507 -0.005841138  0.005905082  0.000000000 -0.002275314
## 3 0.007618231  0.005841138  0.006770507 -0.005841138  0.005905082
## 4 0.019452426  0.007542827  0.007618231  0.005841138  0.006770507
## 5 0.003060112  0.009778623  0.019452426  0.007542827  0.007618231
## 6 0.006526018  0.013644328  0.003060112  0.009778623  0.019452426
##         prodl2        infl3       prodl3        infl4       prodl4
## 1 -0.008153802  0.017423641  0.005817352  0.006496543  0.005851392
## 2  0.004085211  0.000000000 -0.008153802  0.017423641  0.005817352
## 3  0.000000000 -0.002275314  0.004085211  0.000000000 -0.008153802
## 4 -0.005841138  0.005905082  0.000000000 -0.002275314  0.004085211
## 5  0.005841138  0.006770507 -0.005841138  0.005905082  0.000000000
## 6  0.007542827  0.007618231  0.005841138  0.006770507 -0.005841138
##       month
## 1      June
## 2      July
## 3    August
## 4 September
## 5   October
## 6  November

The following code estimates 391 fixed rolling windows with a sample size of 300 in each window:

# = Number of windows and window size
w_size = 300
n_windows = nrow(X) - 300

# = Rolling Window Loop = #
forecasts = foreach(i=1:n_windows, .combine = rbind) %do%{

  # = Select data for the window (in and out-of-sample) = #
  X_in = X[i:(w_size + i - 1), ] # = change to X[1:(w_size + i - 1), ] for expanding window
  X_out = X[w_size + i, ]

  # = Regression Model = #
  m1 = lm(infl0 ~ . - prodl0, data = X_in)
  f1 = predict(m1, X_out)

  # = Random Walk = #
  f2 = tail(X_in$infl0, 1)

  return(c(f1, f2))
}

Finally, the remaining code calculates the forecasting errors, forecasting RMSE across the rolling windows and the Giacomini & White test. As you can see, the test rejected the null hypothesis of both models being equally accurate and the RMSE was smaller for the model with the lags, production and dummies.

# = Calculate and plot errors = #
e1 = tail(X[ ,"infl0"], nrow(forecasts)) - forecasts[ ,1]
e2 = tail(X[ ,"infl0"], nrow(forecasts)) - forecasts[ ,2]
df = data.frame("date"=tail(as.Date(index(data)), n_windows), "Regression" = e1, "RandomWalk" = e2)
mdf = melt(df,id.vars = "date")
ggplot(data = mdf) + geom_line(aes(x = date, y = value, linetype = variable, color = variable))

plot of chunk unnamed-chunk-4

# = RMSE = #
(rmse1 = 1000 * sqrt(mean(e1 ^ 2)))
## [1] 2.400037
(rmse2 = 1000 * sqrt(mean(e2 ^ 2)))
## [1] 2.62445
# = DM test = #
(dm = dm.test(e1, e2))
##
##  Diebold-Mariano Test
##
## data:  e1e2
## DM = -1.977, Forecast horizon = 1, Loss function power = 2,
## p-value = 0.04874
## alternative hypothesis: two.sided

References

Diebold, Francis X., and Robert S. Mariano. “Comparing predictive accuracy.” Journal of Business & economic statistics 20.1 (2002): 134-144.

Giacomini, Raffaella, and Halbert White. “Tests of conditional predictive ability.” Econometrica 74.6 (2006): 1545-1578.

To leave a comment for the author, please follow the link and comment on their blog: R – insightR.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)