Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The next two blog posts will explore the basics of the statistical arbitrage strategies outlined in Ernest Chan’s book, Algorithmic Trading: Winning Strategies and Their Rationale. In the first post we will construct mean reverting time series data from cointegrated ETF pairs. The two pairs we will analyze are EWA (Australia) – EWC (Canada) and IGE (NA Natural Resources) – EWZ (Brazil).

EWA-EWC is a notable ETF pair since both Australia and Canada’s economies are commodity based. Looking at the scatter plot, it seems likely that they cointegrate because of this. IGE-EWZ seems less likely to cointegrate but we will discover that it is possible to salvage a stationary series with a statistical adjustment. A stationary, mean reverting series implies that the variance of the log price increases slower than that of a geometric random walk.

Running a linear regression model with EWA as the dependent variable and EWC as the independent variable we can use the resulting beta as the hedge ratio to create the data series below.

It appears stationary but we will run a few statistical tests to support this conclusion. The first test we will run is the Augmented Dickey-Fuller, which tests whether the data is stationary or trending. We set the lag parameter, k, to 1 since the change in price often has serial correlations.

The ADF test rejects the null hypothesis and supports the stationarity of the series with a p-value < 0.04. The next test we will run is the Hurst Exponent, which will analyze the variance of the log price and compare it to that of a geometric random walk. A geometric random walk has H=0.5, a mean reverting series has H<0.5, and a trending series has H>0.5. Running this test on the log residuals of the linear model gives a Hurst Exponent of 0.27, supporting the ADF’s conclusion. Since this series is now surely stationary, the final analysis we will do is find its half-life of the mean reversion. This is useful for trading as it gives you an idea of what the holding period of the strategy will be. The calculation of the half-life involves regressing y(t)-y(t-1) against y(t-1) and using the lambda found. See my code for further explanation. The half-life of this series is found to be 67 days.

Next, we will look at the IGE-EWZ pair. Running a linear regression model with IGE as the dependent variable and EWZ as the independent variable we can use the resulting beta as the hedge ratio to create the data series below.

Compared to the EWA-EWC pair, this looks a lot less stationary which makes sense considering the price series and scatter plot. Additionally, the ADF test is inconclusive.

The half-life of its mean reversion is calculated to be 344 days. In this form, it is definitely not a very practical pair to trade. Something that may improve the stationarity of this time series is to use an adaptive hedge ratio, determined from using a rolling linear regression model with a designated lookback window. Obviously, the shorter the lookback window, the more that the beta/hedge ratio will fluctuate. Though this would require daily portfolio adjustments, ideally the stationarity of the series will increase substantially. I began with a lookback window of 252, the number of trading days in a year, but it didn’t have large enough of an impact.  Therefore, we will try 21, the average number of trading days in a month, which will result in a significant impact. Without the rolling regression, the beta/hedge ratio was 0.42. Below you can see how the beta changes over time and how it affects the mean reverting data series.

With the adaptive hedge ratio, the ADF test strongly concludes that the time series is stationary. This also significantly cuts down the half-life of the mean reversion to only 33 days.

Though there are a lot more analysis techniques for cointegrated ETF pairs, and even triplets, this post explored the basics of creating two stationary data series. In next week’s post, we will implement some mean reversion trading strategies on these pairs. See ya next week!

Full Code:

require(quantstrat)
require(tseries)
require(roll)
require(ggplot2)

## EWA (Australia) - EWC (Canada)
## Get data
getSymbols("EWA", from="2003-01-01", to="2015-12-31")
getSymbols("EWC", from="2003-01-01", to="2015-12-31")

## Utilize the backwards-adjusted closing prices
adj1 = unclass(EWA$EWA.Adjusted) adj2 = unclass(EWC$EWC.Adjusted)

## Plot the ETF backward-adjusted closing prices
par(new=T)
plot(adj2, type="l", axes=F, xlab="", ylab="", col="red")
par(new=F)

## Plot a scatter graph of the ETF adjusted prices

## Linear regression, dependent ~ independent

## Plot the residuals or hedged pair
plot(comb1$residuals, type="l", xlab="2003 to 2016", ylab="Residuals of EWA and EWC regression") beta = coef(comb1)[2] X = vector(length = 3273) for (i in 1:3273) { X[i]=adj1[i]-beta*adj2[i] } plot(X, type="l", xlab="2003 to 2016", ylab="EWA-hedge*EWC") ## ADF test on the residuals adf.test(comb1$residuals, k=1)

## Hurst Exponent Test
HurstIndex(log(comb1$residuals)) ## Half-life sprd = comb1$residuals
prev_sprd <- c(sprd[2:length(sprd)], 0)
d_sprd <- sprd - prev_sprd
prev_sprd_mean <- prev_sprd - mean(prev_sprd)
sprd.zoo <- merge(d_sprd, prev_sprd_mean)
sprd_t <- as.data.frame(sprd.zoo)

result <- lm(d_sprd ~ prev_sprd_mean, data = sprd_t)
half_life <- -log(2)/coef(result)[2]

#######################################################################################################

## EWZ (Brazil) - IGE (NA Natural Resource)
## Get data
getSymbols("EWZ", from="2003-01-01", to="2015-12-31")
getSymbols("IGE", from="2003-01-01", to="2015-12-31")

## Utilize the backwards-adjusted closing prices
adj1 = unclass(EWZ$EWZ.Adjusted) adj2 = unclass(IGE$IGE.Adjusted)

## Plot the ETF backward-adjusted closing prices
par(new=T)
plot(adj2, type="l", axes=F, xlab="", ylab="", col="red")
par(new=F)

## Plot a scatter graph of the ETF adjusted prices

## Rolling regression
## 252 = year
## 21 = month
window = 21

## Plot beta
rollingbeta <- fortify.zoo(lm\$coefficients[,2],melt=TRUE)
ggplot(rollingbeta, ylab="beta", xlab="time") + geom_line(aes(x=Index,y=Value)) + theme_bw()

## X should be the shifted residuals
X <- vector(length=3273-21)
for (i in 21:3273) {