(This article was first published on

**The Dancing Economist**, and kindly contributed to R-bloggers)In our last post we determined that the ARIMA(2,2,2) model was just plain not going to work for us. Although i didn’t show its residuals failed to pass the acf and pacf test for white noise and the mean of its residuals was greater than three when it should have been much closer to zero.

Today we discover that an AR(10) of the de-trended GDP series may be the best option at hand. Normally when we do model selection we start with the one that has the lowest AIC and then proceed to test the error terms (or residuals) for white noise. Lets take a look at the model specs for the AR(10):

*> model7<-arima(dt,order=c(10,0,0))*

*> model7*

*Call:*

*arima(x = dt, order = c(10, 0, 0))*

*Coefficients:*

*ar1 ar2 ar3 ar4 ar5 ar6 ar7 ar8 ar9 ar10*

*1.5220 -0.4049 -0.2636 0.2360 -0.2132 0.1227 -0.0439 -0.0958 0.3244 -0.2255*

*s.e. 0.0604 0.1105 0.1131 0.1143 0.1154 0.1147 0.1139 0.1127 0.1111 0.0627*

*intercept*

*-21.6308*

*s.e. 57.5709*

*sigma^2 estimated as 1452: log likelihood = -1307.76, aic = 2639.52*

The Ljung-Box Q test checks out to the 20th lag:

*> Box.test(model7$res,lag=20,type=”Ljung-Box”)*

*Box-Ljung test*

*data: model7$res*

*X-squared = 15.0909, df = 20, p-value = 0.7712*

It even checks out to the 30th lag! I changed the way Iplotted the Ljung-Box Q values because after finding this little function called “LBQPlot” which makes life way easier.

*> LBQPlot(model7$res,lag.max=30,StartLag=1)*

Most importantly both the ACF and the PACF of the residuals check out for the white noise process. In the ARIMA(2,2,2) model these weren’t even close to what we wanted them to be.

*> par(mfrow=c(2,1))*

*> acf(model7$res)*

*> pacf(model7$res)*

Unfortunately, our residuals continue to fail the formal tests for normality. I don’t really know what to do about this or even what the proper explanation is, but I have a feeling that these tests are very very sensitive.

> jarque.bera.test(model7$res)

Jarque Bera Test

data: model7$res

X-squared = 7507.325, df = 2, p-value < 2.2e-16

> shapiro.test(model7$res)

Shapiro-Wilk normality test

data: model7$res

W = 0.7873, p-value < 2.2e-16

The mean is also considerably closer to 0 but just not quite there.

> mean(model7$res)

[1] 0.7901055

Take a look at the plot for normality:

> qqnorm(model7$res)

> qqline(model7$res)

In the next post we are going to test how good our model actually is. Today we found our optimal choice in terms of model specs, but we also should see how well it can forecast past values of GDP. In addition to evaluating our models past accuracy we will also practice forecasting into the future. As you continue to read these posts, you should be getting significantly better with R- I know I am! We have covered many new codes that stem from many different libraries. I would like to keep on doing analysis in R so after we finish forecasting GDP I think I may move on to some Econometrics. Please keep forecasting and most certainly keep dancin’,

Steven J.

To

**leave a comment**for the author, please follow the link and comment on their blog:**The Dancing Economist**.R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...