Assessing the Forecasting Ability of Our Model
[This article was first published on The Dancing Economist, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Today we wish to see how our model would have faired forecasting the past 20 values of GDP. Why? Well ask yourself this: How can you know where your going, if you don’t know where you’ve been? Once you understand please proceed on with the following post.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
First recall the trend portion that we have already accounted for:
> t=(1:258)
> t2=t^2
> trendy= 892.656210 + -30.365580*t + 0.335586*t2
And that the de-trended series is just that- the series minus the trend.
dt=GDP-trendy
As the following example will demonstrate- If we decide to assess the model with a forecast of the de-trended series alone we may come across some discouraging results:
> test.data<-dt[-c(239:258)]
> true.data<-dt[-c(1:238)]
> forecast.data<-predict(arima(test.data,order=c(10,0,0),include.mean=FALSE),n.ahead=20)$pred
Now we want to plot the forecast data vs. the actual values of the forecasted de-trended series to get a sense of whether this is accurate or not.
> plot(true.data,forecast.data)
> plot(true.data,forecast.data,main=”True Data vs. Forecast data”)
Clearly it appears as though there is little to no accuracy with the the forecast of our de-trended model alone. In fact a linear regression of the forecast data on the true data makes this perfectly clear.
> reg.model<-lm(true.data~forecast.data)
> summary(reg.model)
Call:
lm(formula = true.data ~ forecast.data)
Residuals:
Min 1Q Median 3Q Max
-684.0 -449.0 -220.8 549.4 716.8
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2244.344 2058.828 -1.090 0.290
forecast.data 2.955 2.568 1.151 0.265
Residual standard error: 540.6 on 18 degrees of freedom
Multiple R-squared: 0.06851, Adjusted R-squared: 0.01676
F-statistic: 1.324 on 1 and 18 DF, p-value: 0.265
> anova(reg.model)
Analysis of Variance Table
Response: true.data
Df Sum Sq Mean Sq F value Pr(>F)
forecast.data 1 386920 386920 1.3238 0.265
Residuals 18 5260913 292273
Now, is a good time to not be discouraged, but rather encouraged to add trend to our forecast. When we run a linear regression of trend on GDP we quickly realize that 99.7 of the variance in GDP can be accounted for by the trend.
> reg.model2<-lm(GDP~trendy)
> summary(reg.model2)
Call:
lm(formula = GDP ~ trendy)
Residuals:
Min 1Q Median 3Q Max
-625.43 -165.76 -36.73 163.04 796.33
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.001371 21.870246 0.0 1
trendy 1.000002 0.003445 290.3 <2e-16 ***
—
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 250.6 on 256 degrees of freedom
Multiple R-squared: 0.997, Adjusted R-squared: 0.997
F-statistic: 8.428e+04 on 1 and 256 DF, p-value: < 2.2e-16
In the end we would have to had accounted for trend anyway so it just makes sense to use it when testing our models accuracy.
> test.data1<-dt[-c(239:258)]
# Important note is that the “-c(239:258)” includes everything except those particular 20 observations #
> true.data1<-dt[-c(1:238)]
> true.data2<-trendy[-c(1:238)]
> forecast.data1<-predict(arima(test.data1,order=c(10,0,0),include.mean=FALSE),n.ahead=20)$pred
> forecast.data2<-(true.data2)
> forecast.data3<-(forecast.data1+forecast.data2)
> true.data3<-(true.data1+true.data2)
Don’t forget to plot your data:
> plot(true.data3,forecast.data3,main=”True Values vs. Predicted Values”)
…and regress the forecasted data on the actual data:
> reg.model3<-lm(true.data3~forecast.data3)
> summary(reg.model3)
Call:
lm(formula = true.data3 ~ forecast.data3)
Residuals:
Min 1Q Median 3Q Max
-443.5 -184.2 16.0 228.3 334.8
Coefficients:
Estimate Std. Error t-value Pr(>|t|)
(Intercept) 8.104e+03 1.141e+03 7.102 1.28e-06 ***
forecast.data3 4.098e-01 7.657e-02 5.352 4.37e-05 ***
—
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 264.8 on 18 degrees of freedom
Multiple R-squared: 0.6141, Adjusted R-squared: 0.5926
F-statistic: 28.64 on 1 and 18 DF, p-value: 4.366e-05
Looking at the plot and the regression results, I feel like this model is pretty accurate considering the fact this is a point forecast and not an interval forecast. Next time on the Dancing Economist we will plot the forecasts into the future with 95% confidence intervals. Until then-
Keep Dancin’
Steven J
To leave a comment for the author, please follow the link and comment on their blog: The Dancing Economist.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.