Assessing the Forecasting Ability of Our Model

Posted on September 2, 2011 by Steven Sabol in R bloggers | 0 Comments

[This article was first published on The Dancing Economist, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Today we wish to see how our model would have faired forecasting the past 20 values of GDP. Why? Well ask yourself this: How can you know where your going, if you don’t know where you’ve been? Once you understand please proceed on with the following post.

First recall the trend portion that we have already accounted for:

> t=(1:258)
> t2=t^2
> trendy= 892.656210 + -30.365580*t + 0.335586*t2

And that the de-trended series is just that- the series minus the trend.

dt=GDP-trendy

As the following example will demonstrate- If we decide to assess the model with a forecast of the de-trended series alone we may come across some discouraging results:

> test.data<-dt[-c(239:258)]
> true.data<-dt[-c(1:238)]
> forecast.data<-predict(arima(test.data,order=c(10,0,0),include.mean=FALSE),n.ahead=20)$pred

Now we want to plot the forecast data vs. the actual values of the forecasted de-trended series to get a sense of whether this is accurate or not.

> plot(true.data,forecast.data)
> plot(true.data,forecast.data,main=”True Data vs. Forecast data”)

Clearly it appears as though there is little to no accuracy with the the forecast of our de-trended model alone. In fact a linear regression of the forecast data on the true data makes this perfectly clear.

> reg.model<-lm(true.data~forecast.data)
> summary(reg.model)

Call:
lm(formula = true.data ~ forecast.data)

Residuals:
Min 1Q Median 3Q Max
-684.0 -449.0 -220.8 549.4 716.8

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2244.344 2058.828 -1.090 0.290
forecast.data 2.955 2.568 1.151 0.265

Residual standard error: 540.6 on 18 degrees of freedom
Multiple R-squared: 0.06851, Adjusted R-squared: 0.01676
F-statistic: 1.324 on 1 and 18 DF, p-value: 0.265

> anova(reg.model)
Analysis of Variance Table

Response: true.data
Df Sum Sq Mean Sq F value Pr(>F)
forecast.data 1 386920 386920 1.3238 0.265
Residuals 18 5260913 292273

Now, is a good time to not be discouraged, but rather encouraged to add trend to our forecast. When we run a linear regression of trend on GDP we quickly realize that 99.7 of the variance in GDP can be accounted for by the trend.

> reg.model2<-lm(GDP~trendy)
> summary(reg.model2)

Call:
lm(formula = GDP ~ trendy)

Residuals:
Min 1Q Median 3Q Max
-625.43 -165.76 -36.73 163.04 796.33

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.001371 21.870246 0.0 1
trendy 1.000002 0.003445 290.3 <2e-16 ***
—
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 250.6 on 256 degrees of freedom
Multiple R-squared: 0.997, Adjusted R-squared: 0.997
F-statistic: 8.428e+04 on 1 and 256 DF, p-value: < 2.2e-16

In the end we would have to had accounted for trend anyway so it just makes sense to use it when testing our models accuracy.

> test.data1<-dt[-c(239:258)]

# Important note is that the “-c(239:258)” includes everything except those particular 20 observations #

> true.data1<-dt[-c(1:238)]

> true.data2<-trendy[-c(1:238)]

> forecast.data1<-predict(arima(test.data1,order=c(10,0,0),include.mean=FALSE),n.ahead=20)$pred

> forecast.data2<-(true.data2)

> forecast.data3<-(forecast.data1+forecast.data2)

> true.data3<-(true.data1+true.data2)

Don’t forget to plot your data:

> plot(true.data3,forecast.data3,main=”True Values vs. Predicted Values”)

…and regress the forecasted data on the actual data:

> reg.model3<-lm(true.data3~forecast.data3)

> summary(reg.model3)

Call:

lm(formula = true.data3 ~ forecast.data3)

Residuals:

Min 1Q Median 3Q Max

-443.5 -184.2 16.0 228.3 334.8

Coefficients:

Estimate Std. Error t-value Pr(>|t|)

(Intercept) 8.104e+03 1.141e+03 7.102 1.28e-06 ***

forecast.data3 4.098e-01 7.657e-02 5.352 4.37e-05 ***

—

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 264.8 on 18 degrees of freedom

Multiple R-squared: 0.6141, Adjusted R-squared: 0.5926

F-statistic: 28.64 on 1 and 18 DF, p-value: 4.366e-05

Looking at the plot and the regression results, I feel like this model is pretty accurate considering the fact this is a point forecast and not an interval forecast. Next time on the Dancing Economist we will plot the forecasts into the future with 95% confidence intervals. Until then-

Keep Dancin’

Steven J

To leave a comment for the author, please follow the link and comment on their blog: The Dancing Economist.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Assessing the Forecasting Ability of Our Model

Related

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)