Seasonal decomposition of short time series

[This article was first published on R on Rob J Hyndman, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Many users have tried to do a seasonal decomposition with a short time series, and hit the error “Series has less than two periods”.

The problem is that the usual methods of decomposition (e.g., decompose and stl) estimate seasonality using at least as many degrees of freedom as there are seasonal periods. So you need at least two observations per seasonal period to be able to distinguish seasonality from noise.

However, it is possible to use a linear regression model to decompose a time series into trend and seasonal components, and then some smoothness assumptions on the seasonal component allow a decomposition with fewer than two full years of data.

This problem came up on crossvalidated.com recently, with the following data set.

df <- ts(c(2735.869,2857.105,2725.971,2734.809,2761.314,2828.224,2830.284,
  2758.149,2774.943,2782.801,2861.970,2878.688,3049.229,3029.340,3099.041,
  3071.151,3075.576,3146.372,3005.671,3149.381), start=c(2016,8), frequency=12)

We can approximate the seasonal pattern using Fourier terms with a few parameters.

library(forecast)
library(ggplot2)
decompose_df <- tslm(df ~ trend + fourier(df, 2))

Then the usual decomposition plot can be constructed.

trend <- coef(decompose_df)[1] + coef(decompose_df)['trend']*seq_along(df)
components <- cbind(
  data = df,
  trend = trend,
  season = df - trend - residuals(decompose_df),
  remainder = residuals(decompose_df)
)
autoplot(components, facet=TRUE)

Here the model is \[ y_t = \alpha + \beta t + \sum_{k=1}^K \left[ \gamma_k \sin\left(\textstyle\frac{2\pi t k}{m}\right) + \psi_k \cos\left(\textstyle\frac{2\pi t k}{m}\right) \right] + \varepsilon_t, \] where \(1 \le K \le m/2\). The larger the value of \(K\), the more complicated the seasonal pattern that can be estimated. If \(K=m/2\), then we are using \(m-1\) degrees of freedom for the seasonal component. (The last sin term will be dropped as \(\sin(\pi t)=0\) for all \(t\).) For a short time series, we can use a small value for \(K\), which can be selected by minimizing the AICc or leave-one-out CV statistic (both computed using CV()). In the above example, the AICc is minimized with \(K=1\) and CV is minimized with \(K=2\).

The trend could also be made nonlinear, by replacing trend with a polynomial or spline (although both will use up more degrees of freedom, and may not be justified with short time series).

As with other methods of decomposition, it is easy enough to remove the seasonal component to get the seasonally adjusted data.

adjust_df <- df - components[,'season']
autoplot(df, series="Data") +
  autolayer(adjust_df, series="Seasonally adjusted")

To leave a comment for the author, please follow the link and comment on their blog: R on Rob J Hyndman.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)