(This article was first published on The Dancing Economist, and kindly contributed to R-bloggers)
Today we wish to see how our model would have faired forecasting the past 20 values of GDP. Why? Well ask yourself this: How can you know where your going, if you don't know where you've been? Once you understand please proceed on with the following post.First recall the trend portion that we have already accounted for:
> t=(1:258)
> t2=t^2
> trendy= 892.656210 + -30.365580*t + 0.335586*t2
And that the de-trended series is just that- the series minus the trend.
dt=GDP-trendy
As the following example will demonstrate- If we decide to assess the model with a forecast of the de-trended series alone we may come across some discouraging results:
> test.data<-dt[-c(239:258)]
> true.data<-dt[-c(1:238)]
> forecast.data<-predict(arima(test.data,order=c(10,0,0),include.mean=FALSE),n.ahead=20)$pred
Now we want to plot the forecast data vs. the actual values of the forecasted de-trended series to get a sense of whether this is accurate or not.
> plot(true.data,forecast.data)
> plot(true.data,forecast.data,main="True Data vs. Forecast data")
Clearly it appears as though there is little to no accuracy with the the forecast of our de-trended model alone. In fact a linear regression of the forecast data on the true data makes this perfectly clear.
> reg.model<-lm(true.data~forecast.data)
> summary(reg.model)
Call:
lm(formula = true.data ~ forecast.data)
Residuals:
Min 1Q Median 3Q Max
-684.0 -449.0 -220.8 549.4 716.8
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2244.344 2058.828 -1.090 0.290
forecast.data 2.955 2.568 1.151 0.265
Residual standard error: 540.6 on 18 degrees of freedom
Multiple R-squared: 0.06851, Adjusted R-squared: 0.01676
F-statistic: 1.324 on 1 and 18 DF, p-value: 0.265
> anova(reg.model)
Analysis of Variance Table
Response: true.data
Df Sum Sq Mean Sq F value Pr(>F)
forecast.data 1 386920 386920 1.3238 0.265
Residuals 18 5260913 292273
Now, is a good time to not be discouraged, but rather encouraged to add trend to our forecast. When we run a linear regression of trend on GDP we quickly realize that 99.7 of the variance in GDP can be accounted for by the trend.
> reg.model2<-lm(GDP~trendy)
> summary(reg.model2)
Call:
lm(formula = GDP ~ trendy)
Residuals:
Min 1Q Median 3Q Max
-625.43 -165.76 -36.73 163.04 796.33
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.001371 21.870246 0.0 1
trendy 1.000002 0.003445 290.3 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 250.6 on 256 degrees of freedom
Multiple R-squared: 0.997, Adjusted R-squared: 0.997
F-statistic: 8.428e+04 on 1 and 256 DF, p-value: < 2.2e-16
In the end we would have to had accounted for trend anyway so it just makes sense to use it when testing our models accuracy.
> test.data1<-dt[-c(239:258)]
# Important note is that the "-c(239:258)" includes everything except those particular 20 observations #
> true.data1<-dt[-c(1:238)]
> true.data2<-trendy[-c(1:238)]
> forecast.data1<-predict(arima(test.data1,order=c(10,0,0),include.mean=FALSE),n.ahead=20)$pred
> forecast.data2<-(true.data2)
> forecast.data3<-(forecast.data1+forecast.data2)
> true.data3<-(true.data1+true.data2)
Don't forget to plot your data:
> plot(true.data3,forecast.data3,main="True Values vs. Predicted Values")
...and regress the forecasted data on the actual data:
> reg.model3<-lm(true.data3~forecast.data3)
> summary(reg.model3)
Call:
lm(formula = true.data3 ~ forecast.data3)
Residuals:
Min 1Q Median 3Q Max
-443.5 -184.2 16.0 228.3 334.8
Coefficients:
Estimate Std. Error t-value Pr(>|t|)
(Intercept) 8.104e+03 1.141e+03 7.102 1.28e-06 ***
forecast.data3 4.098e-01 7.657e-02 5.352 4.37e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 264.8 on 18 degrees of freedom
Multiple R-squared: 0.6141, Adjusted R-squared: 0.5926
F-statistic: 28.64 on 1 and 18 DF, p-value: 4.366e-05
Looking at the plot and the regression results, I feel like this model is pretty accurate considering the fact this is a point forecast and not an interval forecast. Next time on the Dancing Economist we will plot the forecasts into the future with 95% confidence intervals. Until then-
Keep Dancin'
Steven J
To leave a comment for the author, please follow the link and comment on his blog: The Dancing Economist.
R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series,ecdf, trading) and more...



Zero Inflated Models and Generalized Linear Mixed Models with R.
Zuur, Saveliev, Ieno (2012).