**Freakonometrics » R-english**, and kindly contributed to R-bloggers)

Monday, in our MAT8181 class, we’ve discussed seasonal unit roots from a practical perspective (the theory will be briefly mentioned in a few weeks, once we’ve seen multivariate models). Consider some time series , for instance traffic on French roads,

> autoroute=read.table( + "http://freakonometrics.blog.free.fr/public/data/autoroute.csv", + header=TRUE,sep=";") > X=autoroute$a100 > T=1:length(X) > plot(T,X,type="l",xlim=c(0,120)) > reg=lm(X~T) > abline(reg,col="red")

As discussed in a previous post, if there is a trend, we should remove it, and work on the residual

> Y=residuals(reg) > acf(Y,lag=36,lwd=3)

We can observe that there is some seasonality, here. A first strategy might be to assume that there is a seasonal unit root, so we consider , and we try to find some ARMA process. Consider the empirical autocorrelation function of that time series,

> Z=diff(Y,12) > acf(Z,lag=36,lwd=3)

or the partial autocorrelation function

> pacf(Z,lag=36,lwd=3)

The first graph might suggest a MA(1) structure, while the second graph might suggest an AR(1) time series. Let us try both.

> model1=arima(Z,order=c(0,0,1)) > model1 Call: arima(x = Z, order = c(0, 0, 1)) Coefficients: ma1 intercept -0.2367 -583.7761 s.e. 0.0916 254.8805 sigma^2 estimated as 8071255: log likelihood = -684.1, aic = 1374.2 > E1=residuals(model1) > acf(E1,lag=36,lwd=3)

which can be considered as a white noise (if you’re not convinced, try either Box-Pierce or Ljung-Box test). Similarly,

> model2=arima(Z,order=c(1,0,0)) > model2 Call: arima(x = Z, order = c(1, 0, 0)) Coefficients: ar1 intercept -0.3214 -583.0943 s.e. 0.1112 248.8735 sigma^2 estimated as 7842043: log likelihood = -683.07, aic = 1372.15 > E2=residuals(model2) > acf(E2,lag=36,lwd=3)

which can be also considered as a white noise. So what we have, so far is

for some white noise . This suggest the following SARIMA structure on ,

> model2b=arima(Y,order=c(1,0,0), + seasonal = list(order = c(0, 1, 0), + period=12)) > model2b Call: arima(x = Y, order = c(1, 0, 0), seasonal = list(order = c(0, 1, 0), period = 12)) Coefficients: ar1 -0.2715 s.e. 0.1130 sigma^2 estimated as 8412999: log likelihood = -685.62, aic = 1375.25

So far, so good. Now, what if we consider that we do not have a seasonal unit root, but simply a large autoregressive coefficient in some AR structure. Let us try something like

where a natural guess is that this coefficient should – probably – be close to one. Let us try this one,

> model3c=arima(Y,order=c(1,0,0), + seasonal = list(order = c(1, 0, 0), + period = 12)) > model3c Call: arima(x = Y, order = c(1, 0, 0), seasonal = list(order = c(1, 0, 0), period = 12)) Coefficients: ar1 sar1 intercept -0.1629 0.9741 -684.9455 s.e. 0.1170 0.0115 3064.4040 sigma^2 estimated as 8406080: log likelihood = -816.11, aic = 1640.21

which is comparable with what we got previously (somehow), so we might assume that this model can be considered as an interesting one. We will discuss further the fact that the first coefficient might be considered as non-significant.

What is the difference from those two models? With a short term horizon, the two models are comparable. Clearly

> library(forecast) > previ=function(model,h=36,b=40000){ + prev=forecast(model,h) + T=1:85 + Tfutur=86:(85+h) + plot(T,Y,type="l",xlim=c(0,85+h),ylim=c(-b,b)) + polygon(c(Tfutur,rev(Tfutur)),c(prev$lower[,2],rev(prev$upper[,2])),col="orange",border=NA) + polygon(c(Tfutur,rev(Tfutur)),c(prev$lower[,1],rev(prev$upper[,1])),col="yellow",border=NA) + lines(prev$mean,col="blue") + lines(Tfutur,prev$lower[,2],col="red") + lines(Tfutur,prev$upper[,2],col="red") + }

Now, on a (very) long term perspective, the models are quite different: one is stationnary, so the forecast will tend to the average value (here 0, since the trend was removed), while the other one is (seasonaly) integrated, so the confidence interval will increase. For the non stationry, we get

> previ(model2b,600,b=60000)

and for the stationary one

> previ(model3c,600,b=60000)

But as mentioned in the introduction of this course, forecasts with those models are relevent only for short-term horizon (say not too large). And in that case, the prediction is almost the same here,

> previ(model2b,36,b=60000)

> previ(model3c,36,b=60000)

Now, if we come back on our second model, we did mention previously that the autoregressive coefficient might be considered as non-significant. What if we remove it?

> model3d=arima(Y,order=c(0,0,0), + seasonal = list(order = c(1, 0, 0), + period = 12)) > (model3d) Call: arima(x = Y, order = c(0, 0, 0), seasonal = list(order = c(1, 0, 0), period = 12)) Coefficients: sar1 intercept 0.9662 -696.5661 s.e. 0.0134 3182.3017 sigma^2 estimated as 8918630: log likelihood = -817.03, aic = 1640.07

If we look at a (short-term) forecast, we get

> previ(model3d,36,b=32000)

Do you see any difference? To be honest, I don’t… If we look at the figures, we get

> cbind(forecast(model2b,12)$mean,forecast(model3c,12)$mean,forecast(model3d,12)$mean) Time Series: Start = 86 End = 97 Frequency = 1 1 -4908.4920 -5092.8999 -5520.8780 2 -10012.7837 -9640.8103 -9493.0339 3 -3880.2202 -3841.1960 -3828.2611 4 -18102.5211 -17638.4086 -17499.1828 5 -20602.7346 -20090.9117 -19934.1066 6 -10463.2212 -10209.0139 -10132.0439 7 2458.1538 2376.4897 2351.2377 8 -1680.3342 -1654.4844 -1647.0057 9 876.6837 836.2342 823.4934 10 18046.5642 17561.6520 17413.1463 11 21531.4820 20956.3451 20780.2836 12 -3217.6103 -3152.0446 -3132.4112

Figures are different, but not significantly (keep in mind the size of the confidence interval). This might explain why, in R, when we ask for an autoregressive process or order , then we get a model with parameters to estimate, and even if some are not significant, we usually keep them for the forecast. Most of the time, from forecasting point of view, it’s no big deal.

**leave a comment**for the author, please follow the link and comment on their blog:

**Freakonometrics » R-english**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...