Seasonal, or periodic, time series
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Monday, in our MAT8181 class, we’ve discussed seasonal unit roots from a practical perspective (the theory will be briefly mentioned in a few weeks, once we’ve seen multivariate models). Consider some time series , for instance traffic on French roads,
> autoroute=read.table( + "http://freakonometrics.blog.free.fr/public/data/autoroute.csv", + header=TRUE,sep=";") > X=autoroute$a100 > T=1:length(X) > plot(T,X,type="l",xlim=c(0,120)) > reg=lm(X~T) > abline(reg,col="red")

As discussed in a previous post, if there is a trend, we should remove it, and work on the residual 
> Y=residuals(reg) > acf(Y,lag=36,lwd=3)

We can observe that there is some seasonality, here. A first strategy might be to assume that there is a seasonal unit root, so we consider , and we try to find some ARMA process. Consider the empirical autocorrelation function of that time series,
> Z=diff(Y,12) > acf(Z,lag=36,lwd=3)

or the partial autocorrelation function
> pacf(Z,lag=36,lwd=3)

The first graph might suggest a MA(1) structure, while the second graph might suggest an AR(1) time series. Let us try both.
> model1=arima(Z,order=c(0,0,1))
> model1
Call:
arima(x = Z, order = c(0, 0, 1))
Coefficients:
          ma1  intercept
      -0.2367  -583.7761
s.e.   0.0916   254.8805
sigma^2 estimated as 8071255:  log likelihood = -684.1,  aic = 1374.2
> E1=residuals(model1)
> acf(E1,lag=36,lwd=3)

which can be considered as a white noise (if you’re not convinced, try either Box-Pierce or Ljung-Box test). Similarly,
> model2=arima(Z,order=c(1,0,0))
> model2
Call:
arima(x = Z, order = c(1, 0, 0))
Coefficients:
          ar1  intercept
      -0.3214  -583.0943
s.e.   0.1112   248.8735
sigma^2 estimated as 7842043:  log likelihood = -683.07,  aic = 1372.15
> E2=residuals(model2)
> acf(E2,lag=36,lwd=3)

which can be also considered as a white noise. So what we have, so far is
 
for some white noise . This suggest the following SARIMA structure on 
,
> model2b=arima(Y,order=c(1,0,0),
+               seasonal = list(order = c(0, 1, 0),
+               period=12)) 
> model2b
Call:
arima(x = Y, order = c(1, 0, 0), seasonal = list(order = c(0, 1, 0), period = 12))
Coefficients:
          ar1
      -0.2715
s.e.   0.1130
sigma^2 estimated as 8412999:  log likelihood = -685.62,  aic = 1375.25
So far, so good. Now, what if we consider that we do not have a seasonal unit root, but simply a large autoregressive coefficient in some AR structure. Let us try something like
where a natural guess is that this coefficient should – probably – be close to one. Let us try this one,
> model3c=arima(Y,order=c(1,0,0),
+               seasonal = list(order = c(1, 0, 0), 
+               period = 12))
> model3c
Call:
arima(x = Y, order = c(1, 0, 0), seasonal = list(order = c(1, 0, 0), period = 12))
Coefficients:
          ar1    sar1  intercept
      -0.1629  0.9741  -684.9455
s.e.   0.1170  0.0115  3064.4040
sigma^2 estimated as 8406080:  log likelihood = -816.11,  aic = 1640.21
which is comparable with what we got previously (somehow), so we might assume that this model can be considered as an interesting one. We will discuss further the fact that the first coefficient might be considered as non-significant.
What is the difference from those two models? With a short term horizon, the two models are comparable. Clearly
> library(forecast)
> previ=function(model,h=36,b=40000){
+ prev=forecast(model,h)
+ T=1:85
+ Tfutur=86:(85+h)
+ plot(T,Y,type="l",xlim=c(0,85+h),ylim=c(-b,b))
+ polygon(c(Tfutur,rev(Tfutur)),c(prev$lower[,2],rev(prev$upper[,2])),col="orange",border=NA)
+ polygon(c(Tfutur,rev(Tfutur)),c(prev$lower[,1],rev(prev$upper[,1])),col="yellow",border=NA)
+ lines(prev$mean,col="blue")
+ lines(Tfutur,prev$lower[,2],col="red")
+ lines(Tfutur,prev$upper[,2],col="red")
+ }
Now, on a (very) long term perspective, the models are quite different: one is stationnary, so the forecast will tend to the average value (here 0, since the trend was removed), while the other one is (seasonaly) integrated, so the confidence interval will increase. For the non stationry, we get
> previ(model2b,600,b=60000)

and for the stationary one
> previ(model3c,600,b=60000)

But as mentioned in the introduction of this course, forecasts with those models are relevent only for short-term horizon (say not too large). And in that case, the prediction is almost the same here,
> previ(model2b,36,b=60000)

> previ(model3c,36,b=60000)

Now, if we come back on our second model, we did mention previously that the autoregressive coefficient might be considered as non-significant. What if we remove it?
> model3d=arima(Y,order=c(0,0,0),
+               seasonal = list(order = c(1, 0, 0), 
+               period = 12))
> (model3d)
Call:
arima(x = Y, order = c(0, 0, 0), seasonal = list(order = c(1, 0, 0), period = 12))
Coefficients:
        sar1  intercept
      0.9662  -696.5661
s.e.  0.0134  3182.3017
sigma^2 estimated as 8918630:  log likelihood = -817.03,  aic = 1640.07
If we look at a (short-term) forecast, we get
> previ(model3d,36,b=32000)

Do you see any difference? To be honest, I don’t… If we look at the figures, we get
> cbind(forecast(model2b,12)$mean,forecast(model3c,12)$mean,forecast(model3d,12)$mean) Time Series: Start = 86 End = 97 Frequency = 1 1 -4908.4920 -5092.8999 -5520.8780 2 -10012.7837 -9640.8103 -9493.0339 3 -3880.2202 -3841.1960 -3828.2611 4 -18102.5211 -17638.4086 -17499.1828 5 -20602.7346 -20090.9117 -19934.1066 6 -10463.2212 -10209.0139 -10132.0439 7 2458.1538 2376.4897 2351.2377 8 -1680.3342 -1654.4844 -1647.0057 9 876.6837 836.2342 823.4934 10 18046.5642 17561.6520 17413.1463 11 21531.4820 20956.3451 20780.2836 12 -3217.6103 -3152.0446 -3132.4112
Figures are different, but not significantly (keep in mind the size of the confidence interval). This might explain why, in R, when we ask for an autoregressive process or order , then we get a model with 
 parameters to estimate, and even if some are not significant, we usually keep them for the forecast. Most of the time, from forecasting point of view, it’s no big deal.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
