Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This post explains how to construct ETF tracking error (TE) minimization and introduce R packages which perform (sparse) index tracking. ETF (Exchange Traded Fund) is a traded fund listed on the exchange. ETF tries to mimic or follow a target benchmark index (BM) such as S&P500. This is called the tracking error (TE) minimization.

# ETF Tracking Error Optimization using R code

### Index Tracking

ETF select a small number or subset of constituents of stock or bond index to mimic BM index. Since ETF does not contain all constituents of BM index (full replication), tracking error (TE) take places. Furthermore, optimal subset is not fixed but variable according to the market development so that frequently rebalancing is required.

The number of constituents of BM index is so large that the full replication is impossible due to the transaction costs and liquidity problem. Therefore, Index tracking is finding the optimal combination of subset securities for minimizing tracking errors and its objective function is formulated as follows.

\begin{align} TE = \frac{1}{T} \sum_{t=1}^{T} \left( \sum_{i=1}^{N} \left( w_i r_{it} – R_t^I \right)^2 \right) \end{align} Here, $$R_t^I$$ adn $$r_{it}$$ are time $$t$$ returns of BM index and its constituents respectively and $$w_i$$ is the weight of $$i$$ constituent.

Using vector-matrix notation, the above problem is reformulated with its constraints as follows. \begin{align} &\min_{w} \frac{1}{T} || Rw – R^I ||_2^2 \\ \text{subject to}& \\ &e^T w = 1 \\ &\eta_i Z_i \leq w_i < Z_i \delta_i \\ &\sum_{t=1}^{N} Z_i = K \\ &Z_i = 0 \quad or \quad 1, \quad i=1,2,...,N \end{align} Here, $$N$$ is the number of constituents of BM index and $$K$$ is the number of constituents of ETF. $$R^I=(R_1^I,R_2^I,…,R_T^I )^T$$ is a $$T×1$$ vector of BM index return and $$R=(R_1,R_2,…,R_T)$$ is a $$T×N$$ matrix which is concatenated with all $$T×1$$ vector of $$R_i=(r_i1,r_i2,…,r_iT )^T$$ horizontally. $$w=(w_1,w_2,…,w_N )^T$$ is a $$T×1$$ vector of allocation weights.

Seeing the above constraints, first condition is so called budget constraint which means all capital is invested into ETF portfolio. Second condition denote the lower and upper bound for allocation weights. Third condition is a cardinality constraints that $$Z_i$$ may take on 0 or 1 and sum of it is $$K$$. This constraints means only $$K$$ securities from all $$N$$ are invested.

But this problem is considered a difficult problem because cardinality constraints make this NP hard problem, in other words, $$\sum_{t=1}^{N} Z_i = K$$ make this problem highly dimensional discrete problem.. This means only when we calculate all combinations by using mixed integer programming, we can select the optimal combination. But the number of combination is too large to calculate it. For this reason, this problem is also called the sparse index tracking problem. Of course, recently Fengmin, Xu, and Xue (2015) suggest $$L_{1/2}$$ Regularization for this problem.

For this post, we use sparseIndexTracking R package for sparse index tracking and also use ROI.plugin.ecos R package for index tracking and finally compare these two results.

### Second-order conic programming (SOCP)

For index tracking, we use ROI and ROI.plugin.ecos. In particular, ROI.plugin.ecos provide the solver for the second-order conic programming (SOCP).

What is a SOCP and what is the relationship between SOCP and index tracking?

Second-order cone programming (SOCP) problems are convex optimization problems in which a linear function is minimized over the intersection of an affine linear manifold with the Cartesian product of second-order cones.

Index tracking problem is typically rewritten into SOCP format and ROI.plugin.ecos or other index tracking solver need SOCP format as input format. Therefore we need to transform our index tracking errors minimization problem into second-order conic programming problem.

We present the original and transform problem. You can easily find the concept of SOCP in the context of index tracking problem.

For example, we try to mimic the benchmark index by minimizing tracking error. TE problem is as follows.

\begin{align} &\min_{w} \sqrt{\sum_{t=1}^{T} \left( \sum_{i=1}^{N} \left( R_t^I – w_i r_{it} \right)^2 \right)} \\ \text{subject to}& \\ &e^T w = 1 \\ &w > 0 \\ \end{align}

Here, $$w = (w_1 , w_2 , …, w_N)$$ and $$r = (r_1, r_2, …, r_N)$$.

\begin{align} &\min_{w} t \\ \text{subject to}& \\ &\sqrt{\sum_{t=1}^{T} \left( \sum_{i=1}^{N} \left( R_t^I – w_i r_{it} \right)^2 \right)} \ge t \\ &e^T w = 1+t \\ &w > 0 \\ \end{align}

Here, $$w = (w_1 , w_2 , …, w_N, t)$$ and $$r = (r_1, r_2, …, r_N, 1)$$.

It is worth noting that definitions of $$w$$ and $$r$$ are different between two equations. The second equation also include $$t$$ as a control variable. Second equation treats the first equation’s objective function as an additional constraint. For convenience, two equations omit $$\frac{1}{T}$$ since it is a constant and use a square root for formal expression.

Although the definition of SOCP seems somewhat difficult, we can easily observe the characteristics of SOCP from the above two formulation. The bottom line is that convex objective function can be transformed into a constraint and an objective function is replaced by a linear function.

### R package

Using ROI and ROI.plugin.ecos, we can perform the index tracking minimization. But this case, since there is no cardinality constraints, we need to select the subset of securities. But sparseIndexTracking R package implements this cardinality constraints by adjusting the regularization parameter ($$\lambda$$). The higher the $$\lambda$$, the more the coefficients are shrinked towards zero.

### R code

The following R code implements two index tracking problems. We use data which is embedded in sparseIndexTracking R package. For expositional purpose, we assume the universe of stock consisted of 30 because it is difficult to demonstrate the results as a table or figure when using all 386 stocks. But after understanding the main contents, we also deal the 386 case.

 1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495 #==============================================================# Financial Econometrics & Derivatives, ML/DL # using R, Python, Keras, Tensorflow  # by Sang-Heon Lee ## https://kiandlee.blogspot.com#————————————————————–# Index Tracking Error Minimization # using ROI.ecos and sparseIndexTracking#============================================================== graphics.off()  # clear all graphsrm(list = ls()) # remove all files from your workplace    library(sparseIndexTracking)library(ROI)library(ROI.plugin.ecos)    #————————————————# Data#————————————————     # load stock index data    data(INDEX_2010)    y = as.vector(INDEX_2010$SP500) X = as.matrix(INDEX_2010$X)        # comment it when full data is used    X <– X[,1:30]        nobs = length(y); nX = ncol(X) #————————————————# 1) Using ROI and ROI.ecos#————————————————        # w  = c( w1,  w2,  w3, t)’     # Xn = c(Xn1, Xn2, Xn3, 1)    #    # min sqrt( (y1 – X1’*w)^2 + (y2 – X2’*w)^2     #         + (y3 – X3’*w)^2 + (y4 – X4’*w)^2     #         + (y5 – X5’*w)^2    # )    # s.t.    #      w1 + w2 + w3 = 1    #      w1, w2, w3 > 0        # –> Rewritten into the standard form    #    # minimize t    # s.t.    #      sqrt( (y1 – X1’*w)^2 + (y2 – X2’*w)^2     #          + (y3 – X3’*w)^2 + (y4 – X4’*w)^2     #          + (y5 – X5’*w)^2    #      ) <= t    #      w1 + w2 + w3 = 1    #      w1, w2, w3 > 0        #————————————————    # Index tracking error minimization    # using second order cone programming    #————————————————        A <– rbind(c( rep(0,nX), –1), cbind(X,0))        soc <– OP(objective   = L_objective(c(rep(0,nX), 1)),              constraints = c(                  C_constraint(A, K_soc(nobs+1), c(0,y)),                  L_constraint(c(rep(1,nX), 0), “==”, 1))    )        soc_sol <– ROI_solve(soc, solver = “ecos”)    wgt_roi <– soc_sol\$solution[1:nX]    #————————————————# 2) Using sparseIndexTracking#————————————————            # fit portfolio under error measure ETE     # (Empirical Tracking Error)        # Unconstrained    wgt_sps <– spIndexTrack(X, y, lambda = 1e–180, u = 1,                             measure = ‘ete’, thres = 1e–180)        # Constrained    # wgt_sps <- spIndexTrack(X, y, lambda = 1e-7,     #                         u = 1, measure = ‘ete’) #————————————————# 3) Comparison for allocation weights#————————————————        round(cbind(wgt_roi, wgt_sps),4) Colored by Color Scripter cs

With arguments for unconstrained parameters ($$\lambda=1e-180$$ and subset of stocks $$n=30$$, Running the above R code results in the following weight allocations of two R package: ROI with ROI.plugin.ecos and sparseIndexTracking.

 12345678910111213141516171819202122232425262728293031323334353637 > #————————————————> # 3) Comparison for allocation weights> #————————————————>     >     round(cbind(wgt_roi, wgt_sps),4)                   wgt_roi wgt_sps1436513D UN Equity  0.0270  0.02701500785D UN Equity  0.0220  0.02201518855D US Equity  0.0319  0.03199876566D UN Equity  0.0607  0.0607A UN Equity         0.0149  0.0149AA UN Equity        0.0426  0.0426AAPL UW Equity      0.0444  0.0444ABC UN Equity       0.0151  0.0151ABT UN Equity       0.1330  0.1330ADBE UW Equity      0.0114  0.0114ADM UN Equity       0.0127  0.0127ADP UW Equity       0.1440  0.1440ADSK UW Equity      0.0113  0.0113AEE UN Equity       0.0453  0.0453AEP UN Equity       0.0158  0.0159AES UN Equity       0.0074  0.0074AET UN Equity       0.0132  0.0132AFL UN Equity       0.0413  0.0413AGN UN Equity       0.0145  0.0146AIG UN Equity       0.0002  0.0002AIV UN Equity       0.0452  0.0452AIZ UN Equity       0.0202  0.0202AKAM UW Equity      0.0000  0.0000ALL UN Equity       0.0348  0.0348ALTR UW Equity      0.0172  0.0172AMAT UW Equity      0.0336  0.0336AMGN UW Equity      0.0411  0.0411AMP UN Equity       0.0503  0.0503AMT UN Equity       0.0437  0.0437AMZN UW Equity      0.0051  0.0051 Colored by Color Scripter cs

For the sparse index tracking, with arguments for unconstrained parameters ($$\lambda=1e-6$$ and subset of stocks $$n=30$$, Running the above R code results in the following weight allocations of two R package: ROI with ROI.plugin.ecos and sparseIndexTracking. We can easily find that the sparse index tracking demonstrates the selection effect.

 12345678910111213141516171819202122232425262728293031323334353637 > #————————————————> # 3) Comparison for allocation weights> #————————————————>     >     round(cbind(wgt_roi, wgt_sps),4)                   wgt_roi wgt_sps1436513D UN Equity  0.0270  0.03971500785D UN Equity  0.0220  0.00001518855D US Equity  0.0319  0.03799876566D UN Equity  0.0607  0.0656A UN Equity         0.0149  0.0000AA UN Equity        0.0426  0.0445AAPL UW Equity      0.0444  0.0510ABC UN Equity       0.0151  0.0000ABT UN Equity       0.1330  0.1598ADBE UW Equity      0.0114  0.0000ADM UN Equity       0.0127  0.0000ADP UW Equity       0.1440  0.1783ADSK UW Equity      0.0113  0.0000AEE UN Equity       0.0453  0.0652AEP UN Equity       0.0158  0.0000AES UN Equity       0.0074  0.0000AET UN Equity       0.0132  0.0000AFL UN Equity       0.0413  0.0473AGN UN Equity       0.0145  0.0000AIG UN Equity       0.0002  0.0000AIV UN Equity       0.0452  0.0543AIZ UN Equity       0.0202  0.0000AKAM UW Equity      0.0000  0.0000ALL UN Equity       0.0348  0.0418ALTR UW Equity      0.0172  0.0000AMAT UW Equity      0.0336  0.0507AMGN UW Equity      0.0411  0.0499AMP UN Equity       0.0503  0.0595AMT UN Equity       0.0437  0.0543AMZN UW Equity      0.0051  0.0000 Colored by Color Scripter cs

The two figures below show the weight allocations of two cases. When there is no regularization for cardinality constraint, two results are same.

When there is a regularization for cardinality constraint, two results are different since sparse index tracking select a subset of securities from 30 universe.

When we use all 386 securities, the folloiwng two figures are obtained.

In the above case of all data, we can observe some discrepancies in allocation weights but overall distribution of weights are similar. As variables are too many, some numerical error is largely cumulated.

But for more precise calculations, we think that investigations with hyperparameters (\lambda and so on) varying are also needed.

These two approaches are complementary because sparse index tracking does not consider economically significant variables but statistically significant variables. $$\blacksquare$$