**R – first differences**, and kindly contributed to R-bloggers)

There are several ways to model seasonality in a time series. Traditionally, trend-cycle decomposition such as the Holt-Winters procedure has been very popular. Also, until today applied researchers often try to account for seasonality by using seasonal dummy variables. But of course, in a stochastic process it seems unreasonable to assume that seasonal effects are purely deterministic. Therefore, in a time series context seasonal extensions of the classical ARMA model are very popular. One of these extensions is the seasonal unit root model

where is the usual lag operator and is the period length of the seasonality such as or for a yearly cycle in quarterly or monthly data and is some short run component such as an innovation term or a SARMA(p,q)-(P,Q) model.

I have always been puzzled about the popularity of this process. Probably it is due to the obvious conceptual simplicity. It also seems to be a natural extension of the usual non-seasonal integrated ARIMA model. However, the model generates a feature that we will hardly ever observe in an actual time series: as time progresses the difference between consecutive values of the will become infinitely large.

To see this consider the following example. To generate seasonal unit root processes we first define a function that generates seasonal sums.

```
seasonal_sum<-function(data,S){
out<-data
for(t in (S+1):length(data)){out[t]<-data[t]+out[(t-S)]}
out
}
```

We then generate a sample of 250 observations from the process and look at its plot and its autocorrelation function. We choose a period of , so that the example resembles a yearly cycle in monthly data.

```
series<-seasonal_sum(rnorm(250),S=12)
acf(series)
```

`ts.plot(series, ylab="series", xlab="t")`

From the autocorrelation function (ACF) it can be seen that there is a pronounced seasonal behavior with a spike in the ACF at each lag that is an integer multiple of . However, the plot of the series shows a curious behavior. As increases, we see that the difference between two consecutive observations increases. This behavior becomes even more pronounced if we increase the sample size to 2500.

`ts.plot(seasonal_sum(rnorm(2500),S=12), ylab="series")`

To understand this feature consider the usual unit root model with an innovation with variance . This can be expressed as the sum over all past innovations.

From this representation it is easy to show that the variance of the process is given by

so that the variance becomes infinite as approaches infinity. This is a property that seems to apply to many economic and financial time series and is therefore completely reasonable.

Now, the seasonal unit root model can be expressed in a similar way, but with an important twist. To see this, denote the th innovation in the th repetition of the cycle of length by . This means that if you have monthly observations the innovation in the first January in the sample is and the innovation in the second January in the sample is . By the same principle the innovation in the th December in the sample would be . Therefore, any observation , for some and can be represented as

.

The important thing to note here is that for two consecutive observations within the th repetition of the cycle we have and . Since and are independent streams of random numbers this means that and are independent random walks! Consequently, the difference of the process is given by

so that

Since goes to infinity as goes to infinity, so does the variance of the changes. Has anybody ever seen a series that exhibits such a feature? Of course in reality we would expect that the innovations are not but show some kind of dependence structure, so that the random walks are not completely independent anymore. However, if the dependence is weak – such as that of an ARMA process – they are still asymptotically independent for large lags. Therefore, the same issue arises, as can be seen from the example below.

```
sarima_sim<-function(T, S, arma_model){
arma_series<-arima.sim(n=T, model=arma_model)
seasonal_sum(data=arma_series, S=S)
}
sarima_series<-sarima_sim(T=250, S=12, arma_model=list(ar=c(0.5,0.3)))
acf(sarima_series)
```

`ts.plot(sarima_series, ylab="series")`

`ts.plot(sarima_sim(T=2500, S=12, arma_model=list(ar=c(0.5,0.3))), ylab="series")`

So what is the conclusion from all this? The seasonal unit root process seems to be ill suited to model most behavior that we observe in practice. However, it is well known that it often generates a good fit. Especially in shorter time series the drawbacks of the seasonal unit root process do not have to become visible. Nevertheless, I think it is fair to say that one could envision a more satisfactory model. One avenue that seems very useful in this context is that of seasonal long memory processes that are able to combine some persistence in the cyclical fluctuations with a finite variance.

Another important conclusion is that we have to be careful with seemingly direct extensions of standard models such as the ARIMA. The fact that the ARIMA is extremely successful in modelling the non-seasonal component, does not necessarily mean that the SARIMA is a good model for the seasonal applications that we have in mind, too.

**leave a comment**for the author, please follow the link and comment on their blog:

**R – first differences**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...