# Assessing the Forecasting Ability of Our Model

[This article was first published on

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Today we wish to see how our model would have faired forecasting the past 20 values of GDP. Why? Well ask yourself this: How can you know where your going, if you don’t know where you’ve been? Once you understand please proceed on with the following post.**The Dancing Economist**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

First recall the trend portion that we have already accounted for:

> t=(1:258)

> t2=t^2

> trendy= 892.656210 + -30.365580*t + 0.335586*t2

And that the de-trended series is just that- the series minus the trend.

dt=GDP-trendy

As the following example will demonstrate- If we decide to assess the model with a forecast of the de-trended series alone we may come across some discouraging results:

> test.data<-dt[-c(239:258)]

> true.data<-dt[-c(1:238)]

> forecast.data<-predict(arima(test.data,order=c(10,0,0),include.mean=FALSE),n.ahead=20)$pred

Now we want to plot the forecast data vs. the actual values of the forecasted de-trended series to get a sense of whether this is accurate or not.

> plot(true.data,forecast.data)

> plot(true.data,forecast.data,main=”True Data vs. Forecast data”)

Clearly it appears as though there is little to no accuracy with the the forecast of our de-trended model alone. In fact a linear regression of the forecast data on the true data makes this perfectly clear.

> reg.model<-lm(true.data~forecast.data)

> summary(reg.model)

Call:

lm(formula = true.data ~ forecast.data)

Residuals:

Min 1Q Median 3Q Max

-684.0 -449.0 -220.8 549.4 716.8

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -2244.344 2058.828 -1.090 0.290

forecast.data 2.955 2.568 1.151 0.265

Residual standard error: 540.6 on 18 degrees of freedom

Multiple R-squared: 0.06851, Adjusted R-squared: 0.01676

F-statistic: 1.324 on 1 and 18 DF, p-value: 0.265

> anova(reg.model)

Analysis of Variance Table

Response: true.data

Df Sum Sq Mean Sq F value Pr(>F)

forecast.data 1 386920 386920 1.3238 0.265

Residuals 18 5260913 292273

Now, is a good time to not be discouraged, but rather encouraged to add trend to our forecast. When we run a linear regression of trend on GDP we quickly realize that 99.7 of the variance in GDP can be accounted for by the trend.

> reg.model2<-lm(GDP~trendy)

> summary(reg.model2)

Call:

lm(formula = GDP ~ trendy)

Residuals:

Min 1Q Median 3Q Max

-625.43 -165.76 -36.73 163.04 796.33

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.001371 21.870246 0.0 1

trendy 1.000002 0.003445 290.3 <2e-16 ***

—

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 250.6 on 256 degrees of freedom

Multiple R-squared: 0.997, Adjusted R-squared: 0.997

F-statistic: 8.428e+04 on 1 and 256 DF, p-value: < 2.2e-16

In the end we would have to had accounted for trend anyway so it just makes sense to use it when testing our models accuracy.

> test.data1<-dt[-c(239:258)]

# Important note is that the “-c(239:258)” includes everything except those particular 20 observations #

> true.data1<-dt[-c(1:238)]

> true.data2<-trendy[-c(1:238)]

> forecast.data1<-predict(arima(test.data1,order=c(10,0,0),include.mean=FALSE),n.ahead=20)$pred

> forecast.data2<-(true.data2)

> forecast.data3<-(forecast.data1+forecast.data2)

> true.data3<-(true.data1+true.data2)

Don’t forget to plot your data:

> plot(true.data3,forecast.data3,main=”True Values vs. Predicted Values”)

…and regress the forecasted data on the actual data:

> reg.model3<-lm(true.data3~forecast.data3)

> summary(reg.model3)

Call:

lm(formula = true.data3 ~ forecast.data3)

Residuals:

Min 1Q Median 3Q Max

-443.5 -184.2 16.0 228.3 334.8

Coefficients:

Estimate Std. Error t-value Pr(>|t|)

(Intercept) 8.104e+03 1.141e+03 7.102 1.28e-06 ***

forecast.data3 4.098e-01 7.657e-02 5.352 4.37e-05 ***

—

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 264.8 on 18 degrees of freedom

Multiple R-squared: 0.6141, Adjusted R-squared: 0.5926

F-statistic: 28.64 on 1 and 18 DF, p-value: 4.366e-05

Looking at the plot and the regression results, I feel like this model is pretty accurate considering the fact this is a point forecast and not an interval forecast. Next time on the Dancing Economist we will plot the forecasts into the future with 95% confidence intervals. Until then-

Keep Dancin’

Steven J

To

**leave a comment**for the author, please follow the link and comment on their blog:**The Dancing Economist**.R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.