Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Time series data is an important area of analysis, especially if you do a lot of web analytics. To be able to analyse time series effectively, it helps to understand the interaction between general seasonality in activity and the underlying trend.

The interactions between trend and seasonality are typically classified as either additive or multiplicative. This post looks at how we can classify a given time series as one or the other to facilitate further processing.

It’s important to understand what the difference between a multiplicative time series and an additive one before we go any further.

There are three components to a time series:
trend how things are overall changing
seasonality how things change within a given period e.g. a year, month, week, day
error/residual/irregular activity not explained by the trend or the seasonal value

How these three components interact determines the difference between a multiplicative and an additive time series.

In a multiplicative time series, the components multiply together to make the time series. If you have an increasing trend, the amplitude of seasonal activity increases. Everything becomes more exaggerated. This is common when you’re looking at web traffic.

In an additive time series, the components add together to make the time series. If you have an increasing trend, you still see roughly the same size peaks and troughs throughout the time series. This is often seen in indexed time series where the absolute value is growing but changes stay relative.

You can have a time series that is somewhere in between the two, a statistician’s “it depends”, but I’m interested in attaining a quick classification so I won’t be handling this complication here.

## There’s a package for that

When I first started doing time series analysis, the only way to visualise how a time series splits into different components was to use base R. About the time I was feeling the pain, someone released a ggplot2 time series extension! I’ll be using ggseas where I can.

We’ll use the `nzbop` data set from ggseas to, first of all, examine a single time series and then process all the time series in the dataset to determine if they’re multiplicative or additive.

``````sample_ts<-nzdata[Account == "Current account" & Category=="Services; Exports total",
.(TimePeriod, Value)]
``````
TimePeriodValue
1971-06-3055
1971-09-3056
1971-12-3160
1972-03-3165
1972-06-3065
1972-09-3063

I’ll be using other packages (like data.table) and will only show relevant code snippets as I go along. You can get the whole script in a GIST.

## Decomposing the data

To be able to determine if the time series is additive or multiplicative, the time series has to be split into its components.

Existing functions to decompose the time series include `decompose()`, which allows you pass whether the series is multiplicative or not, and `stl()`, which is only for additive series without transforming the data. I could use `stl()` with a multiplicative series if I transform the time series by taking the log. For either function, I need to know whether it’s additive or multiplicative first.

### The trend

The first component to extract is the trend. There are a number of ways you can do this, and some of the simplest ways involve calculating a moving average or median.

``````sample_ts[,trend := zoo::rollmean(Value, 8, fill=NA, align = "right")]
``````
TimePeriodValuetrend
2014-03-3152124108.625
2014-06-3037744121.750
2014-09-3036984145.500
2014-12-3147524236.375
2015-03-3161544376.500
2015-06-3045434478.875

A moving median is less sensitive to outliers than a moving mean. It doesn’t work well though if you have a time series that includes periods of inactivity. Lots of 0s can result in very weird trends.

### The seasonality

Seasonality will be cyclical patterns that occur in our time series once the data has had trend removed.

Of course, the way to de-trend the data needs to additive or multiplicative depending on what type your time series is. Since we don’t know the type of time series at this point, we’ll do both.

``````sample_ts[,`:=`( detrended_a = Value - trend,  detrended_m = Value / trend )]
``````
2014-03-3152124108.6251103.3751.2685509
2014-06-3037744121.750-347.7500.9156305
2014-09-3036984145.500-447.5000.8920516
2014-12-3147524236.375515.6251.1217137
2015-03-3161544376.5001777.5001.4061465
2015-06-3045434478.87564.1251.0143172

To work out the seasonality we need to work out what the typical de-trended values are over a cycle. Here I will calculate the mean value for the observations in Q1, Q2, Q3, and Q4.

``````sample_ts[,`:=`(seasonal_a = mean(detrended_a, na.rm = TRUE),
seasonal_m = mean(detrended_m, na.rm = TRUE)),
by=.(quarter(TimePeriod)) ]
``````
2014-03-3152124108.6251103.3751.2685509574.19191.2924422
2014-06-3037744121.750-347.7500.9156305-111.28781.0036648
2014-09-3036984145.500-447.5000.8920516-219.83630.9488803
2014-12-3147524236.375515.6251.1217137136.78271.1202999
2015-03-3161544376.5001777.5001.4061465574.19191.2924422
2015-06-3045434478.87564.1251.0143172-111.28781.0036648

My actual needs aren’t over long economic periods so I’m not using a better seasonality system for this blog post. There are some much better mechanisms than this.

### The remainder

Now that we have our two components, we can calculate the residual in both situations and see which has the better fit.

``````sample_ts[,`:=`( residual_a = detrended_a - seasonal_a,
residual_m = detrended_m / seasonal_m )]
``````
2014-03-3152124108.6251103.3751.2685509574.19191.2924422529.18310.9815146
2014-06-3037744121.750-347.7500.9156305-111.28781.0036648-236.46220.9122871
2014-09-3036984145.500-447.5000.8920516-219.83630.9488803-227.66370.9401098
2014-12-3147524236.375515.6251.1217137136.78271.1202999378.84231.0012620
2015-03-3161544376.5001777.5001.4061465574.19191.29244221203.30811.0879763
2015-06-3045434478.87564.1251.0143172-111.28781.0036648175.41281.0106135

## Visualising decomposition

I’ve done the number crunching, but you could also perform a visual decomposition. ggseas gives us a function `ggsdc()` which we can use.

``````ggsdc(sample_ts, aes(x = TimePeriod, y = Value), method = "decompose",
frequency = 4, s.window = 8, type = "additive")+ geom_line()+
``````  The different decompositions produce differently distributed residuals. We need to assess these to identify which decomposition is a better fit.

## Assessing fit

After decomposing our data, we need to compare the residuals. As we’re just trying to classify the time series, we don’t need to do anything particularly sophisticated – a big part of this exercise is to produce a quick function that could be used to perform an initial classification in a batch processing environment so simpler is better.

We’re going to check the whether how much correlation between data points is still encoded within the residuals. This is the Auto-Correlation Factor (ACF) and it has a function for calculating it. As some of the correlations could be negative we will select the type with the smallest sum of squares of correlation values.

``````ssacf<- function(x) sum(acf(x, na.action = na.omit)\$acf^2)
sample_ts[,.(ts_type = compare_ssacf(residual_a, residual_m ))]
``````
ts_type
Multiplicative

## Putting it all together

This isn’t a fully generalized function (as it doesn’t have configurable lags, medians, seasonality etc) but if I had to apply to run this exercise over multiple time series from this dataset, my overall function and usage would look like:

``````ssacf<- function(x) sum(acf(x, na.action = na.omit, plot = FALSE)\$acf^2)
m<-copy(dt)
m[,trend := zoo::rollmean(Value, 8, fill="extend", align = "right")]
m[,`:=`( detrended_a = Value - trend,  detrended_m = Value / trend )]
m[Value==0,detrended_m:= 0]
m[,`:=`(seasonal_a = mean(detrended_a, na.rm = TRUE),
seasonal_m = mean(detrended_m, na.rm = TRUE)),
by=.(quarter(TimePeriod)) ]
m[is.infinite(seasonal_m),seasonal_m:= 1]
m[,`:=`( residual_a = detrended_a - seasonal_a,
residual_m = detrended_m / seasonal_m)]
compare_ssacf(m\$residual_a, m\$residual_m )
}

# Applying it to all time series in table
.(Account, Category)]
``````
AccountCategoryType
Current accountBalanceMultiplicative
Current accountServices; Exports totalMultiplicative
Current accountPrimary income; Inflow totalMultiplicative
Current accountSecondary income; Inflow totalMultiplicative
Current accountGoods balanceMultiplicative
Current accountServices balanceMultiplicative
Current accountSecondary income; Outflow totalMultiplicative
Capital accountInflow totalMultiplicative
Capital accountOutflow totalMultiplicative
NANet errors and omissionsMultiplicative
Financial accountForeign inv. in NZ totalMultiplicative
Financial accountForeign inv. in NZ; Direct inv. liabilitiesAdditive
Financial accountForeign inv. in NZ; Portfolio inv. liabilitiesMultiplicative
Financial accountForeign inv. in NZ; Other inv. liabilitiesMultiplicative
Financial accountNZ inv. abroad; Direct inv. assetsMultiplicative
Financial accountNZ inv. abroad; Financial derivative assetsMultiplicative
Financial accountNZ inv. abroad; Reserve assetsMultiplicative

## Conclusion

This is a very simple way of quickly assessing whether multiple time series are additive or multiplicative. It gives an effective starting point for conditionally processing batches of time series. Get the GIST of the code used throughout this blog to work through it yourself. If you’ve got an easier way of classifying time series, let me know in the comments!

The post Is my time series additive or multiplicative? appeared first on Locke Data. Locke Data are a data science consultancy aimed at helping organisations get ready and get started with data science.