Factor Modeling in R

[This article was first published on Posts on Matthew Smith R Shenanigans, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The most popular models for analysing the risk of portfolios are factor models, since stocks have a tendency to move together. The principal component of securities often explains a large share of it’s variance. Since we are mostly concerned with multiple assets which form a portfolio we need to account for this. Some questions might be why do stocks with low Price to Book ratios outperform stocks with higher Price to Book ratios? Here the “Price” part of the ratio is simply the share price (per share) and the “Book” part of the ratio is the “Shareholders Equity” / “Shares Outstanding” which are items found on a companies Balance Sheet.

Another important reason Factor models are popular are dimensionality. Assume \(N\) assets which subsequently has \(N\) variances and \(N(N-1)/2\) correlations. If we have a factor model with \(F\) factors, it has \(N\) idiosyncratic variances, \(F\) factor variances and \(NF\) betas. As long as:

\[2F < N(N-1)/(N+1)\]

We have fewer parameters with factor models.

The Capital Asset Pricing Model Sharpe (1964) is the simplest factor model consisting of a single factor.

\[r_i = a_i + b_if + e_i\]

The mean-variance parameters are:

\[\overline{r}_i = a_{i} + b_{i}\overline{y}\] \[\sigma_{i}^2 = b_{i}^2\sigma_{\epsilon_{i}}^2\]

\[\sigma_{ij} = b_{i}b_{j}\sigma_{y}^2\]

The \(Cov(r_{i}, y) = Cov(a_{i} + b_{i}y + e_{i}, y) = b_{i}\sigma_{y}^2\) and therefore \(b_{i}\) is;

\[b_{i} = \dfrac{Cov(r_{i}, y)}{\sigma_{y}^2}\]

This is alternatively expressed as;

\[b_{i} = Cov(r_{i}, y) / Var(y)\]

Where \(b\) is the slope of the regression line which minimises the squared distance of returns and \(a\) is the intercept or \(alpha\) which are expressed on a line \(a + bf\).

I will go through these calculations using Base R functions, however first we need some data and for this I like the tidy functions which require a few packages.

I download daily price data from Yahoo Finance using quantmod or tidyquant’s wrapper to the quantmod package. The difference here being that quantmod collects data and stores it as an xts object and tidyquant collects the data and stores it as a tibble, from here we are able to more easily use functions from the tidyverse way of handling data, we convert the data back to an xts object using the tk_xts function from the timetk package.

The firms are Apple, Harley Davidson, Nvidia, Microsoft, AMD, Ford, General Electric, 3M, Intel and Amazon.

library(quantmod)
library(ggplot2)
library(tidyquant)
library(tidyr)
library(dplyr)
library(timetk)
library(tibble)
library(rvest)
library(corrplot) 
library(stringr)

start_date <- "2017-01-01"
end_date <- "2019-01-01"

symbols <- c("AAPL", "HOG", "NVDA", "MSFT", "AMD", "F", "GE", "MMM", "INTL", "AMZN")

portfolio_prices <- tq_get(
  symbols,
  from = start_date,
  to = end_date
  ) %>%
  select(symbol, date, adjusted) %>% 
  pivot_wider(names_from = symbol, values_from = adjusted) %>% 
  tk_xts(date_var = date)

The data looks like the following, I removed the Open, High, Low, Close and Volume data and kept just the Adjusted prices, where each asset is it’s own column, the data has been converted into a time series object or an xts object where the date is stored as the index (or rownames) and is not visable in the table below.

AAPL HOG NVDA MSFT AMD F GE MMM INTL AMZN
110.9539 53.59669 101.0116 59.49657 11.43 10.43042 28.13062 165.3643 40.41 753.67
110.8297 54.18777 103.3683 59.23036 11.43 10.91093 28.13950 165.6151 40.86 757.18
111.3933 54.24233 100.7443 59.23036 11.24 10.57954 27.97971 165.0485 38.77 780.45
112.6351 53.74219 102.0910 59.74376 11.32 10.57126 28.05961 165.5315 38.71 795.99
113.6668 52.87832 106.2301 59.55362 11.49 10.46355 27.92646 164.6399 38.33 796.92
113.7815 53.19659 105.4280 59.53460 11.44 10.64582 27.84656 163.9991 38.46 795.90

I also collect the S&P500 data using the same method:

SPY <- tq_get(
  "^GSPC",
  from = start_date,
  to = end_date
) %>% 
  select(date, adjusted) %>% 
  tk_xts(date_var = date)

Which looks similar to the individual asset prices.

adjusted
2257.83
2270.75
2269.00
2276.98
2268.90
2268.90

We can plot the data using the chartSeries function.

chartSeries(SPY)

I compute the daily log returns for all the assets in the portfolio_returns and the SPY returns

portfolio_returns <- diff(log(portfolio_prices), na.pad = FALSE)
SPY_returns <- diff(log(SPY), na.pad = FALSE)

The returns data for each asset looks like:

AAPL HOG NVDA MSFT AMD F GE MMM INTL AMZN
-0.001119731 0.010967925 0.023062871 -0.0044844490 0.000000000 0.0450387901 0.0003155849 0.001515279 0.011074335 0.004646413
0.005072384 0.001006252 -0.025713094 0.0000000000 -0.016762633 -0.0308429974 -0.0056944999 -0.003426713 -0.052504862 0.030269695
0.011086527 -0.009263191 0.013278832 0.0086305184 0.007092228 -0.0007833275 0.0028513212 0.002921873 -0.001548813 0.019715919
0.009117836 -0.016204986 0.039742771 -0.0031877844 0.014906019 -0.0102405428 -0.0047565496 -0.005400948 -0.009865007 0.001167666
0.001008053 0.006000910 -0.007578959 -0.0003193095 -0.004361106 0.0172690614 -0.0028650791 -0.003899774 0.003385783 -0.001280696
0.005358706 -0.008411454 -0.012380417 0.0090613598 -0.021202208 -0.0141067872 0.0031828276 0.007391130 0.009059257 0.003912422

We can use the autoplot to plot xts or time series data with ggplot functionality, (I only plot the first 3 assets):

autoplot(portfolio_returns[,1:3]) +
  xlab("Date") +
  ggtitle("Asset Log Returns") +
  theme_tq()

Technical R:

From here I will talk a little more technically but will keep it as simple as possible.

Next I compute the calculations described at the begining of the post. We first need to define a few things:

  • The number of assets we have in the portfolio, denoted previously as \(N\)
  • The number of days we apply our model which is usually denoted as \(T\).
N_portfolio <- ncol(portfolio_prices)
days <- nrow(portfolio_prices)

(a) We can compute \(\beta\) as the following:

beta <- cov(portfolio_returns, SPY_returns) / as.numeric(var(SPY_returns))

Which returns:

adjusted
AAPL 1.2722563
HOG 0.8957108
NVDA 1.9681667
MSFT 1.4221581
AMD 1.8544079
F 0.8626349

However, in order to understand it better we can break it down each of the computations:

The \(cov\) covariance matrix part of the \(beta\) formula looks like:
adjusted
AAPL 0.0000852
HOG 0.0000600
NVDA 0.0001318
MSFT 0.0000953
AMD 0.0001242
F 0.0000578
GE 0.0000548
MMM 0.0000714
INTL 0.0000576
AMZN 0.0001056

We can compute the covariances using base R as follows:

\[Cov(x, y) = \sum_{i=1}^N \dfrac{((x_i - \overline{x}) (y_{i} - \overline{y}))}{(n-1)}\] Where \(x_{i}\) is our asset and \(y_{i}\) is the SPY500. In R we can compute this as:

x = portfolio_returns[,1] # which takes the first Asset in the portfolio_returns data
x_bar = colMeans(x)

y = SPY_returns[,1]
y_bar = colMeans(y)

n = length(x)
sum(((x - x_bar)*(y - y_bar))) / (n-1)
## [1] 0.00008521578

Which corresponds to the first result, AAPL in the cov(portfolio_returns, SPY_returns) calculation. We can change the \(1\) in x = portfolio_returns[,1] to \(2\), \(3\), \(4\)\(N\) to compute the covariances for each of the assets (HOG, NVDA and MSFT, respectively), or we can wrap the above into a for loop to compute for all assets. Note: I only changed the \(1\) to an \(i\) from the above equation in the for loop, everything else is constant.

for(i in 1:ncol(portfolio_returns)){
  x = portfolio_returns[,i]
  x_bar = colMeans(x)
  y = SPY_returns[,1]
  y_bar = colMeans(y)
  cov_x_y = sum(((x - x_bar)*(y - y_bar))) / (n-1)
  print(cov_x_y)
}
## [1] 0.00008521578
## [1] 0.00005999475
## [1] 0.0001318279
## [1] 0.00009525621
## [1] 0.0001242083
## [1] 0.00005777932
## [1] 0.00005477266
## [1] 0.00007136762
## [1] 0.0000575575
## [1] 0.0001056044

The variance of the SPY returns is.

x
0.000067

Which is calculated as:

\[Var = \sum(\dfrac{((y - \overline{y})^2)} {(n-1)})\] In base R we can simply calculate it as:

n = length(SPY_returns)
y = SPY_returns[,1]
y_bar = colMeans(y)

sum(((y - y_bar)**2) / (n-1))
## [1] 0.00006698004

Putting all this together, we can compute \(beta\). Recall, that \(\beta = \dfrac{Cov(r_{i}, y)}{Var(y)}\) where here \(r_{i}\) is each of our assets in our portfolio and \(y\) is the market returns or SPY500 returns.

To compute \(beta\) for each of our assets using base R we can wrap the above code into a function:

myBetaComputation <- function(ASSET){
  x = ASSET                   # which Asset in the portfolio_returns data we want to compute beta for.
  x_bar = colMeans(x)
  
  y = SPY_returns[,1]
  y_bar = colMeans(y)
  
  cov_x_y <- sum(((x - x_bar)*(y - y_bar))) / (n-1)
  
  
  n = length(SPY_returns)
  y = SPY_returns[,1]
  y_bar = colMeans(y)
  
  var_y <- sum(((y - y_bar)**2) / (n-1))
  
  return(cov_x_y / var_y)
}

We can apply this function to individual assets in our data and then all of them:

myBetaComputation(portfolio_returns[,1])        # Computes for the first asset
## [1] 1.272256
myBetaComputation(portfolio_returns[,2])        # Computes for the second asset
## [1] 0.8957108
sapply(portfolio_returns, myBetaComputation)    # Computes for all assets
##      AAPL       HOG      NVDA      MSFT       AMD         F        GE       MMM 
## 1.2722563 0.8957108 1.9681667 1.4221581 1.8544079 0.8626349 0.8177459 1.0655057 
##      INTL      AMZN 
## 0.8593232 1.5766544

The interpretation here is that a value of 1 indicates that the asset moves in perfect correlation with the market, values \(>\) 1 indicate that an asset moves more than when the market moves or moves with more volatility when the market moves and values \(<\) 1 indicates that an asset moves less than the market.

Typically Tech stocks have a higher market \(\beta\) whereas mature non-tech stocks have a lower market \(\beta\). That is, AAPL, NVDA, MSFT and AMD all have betas above 1 whereas HOG and Ford (F) have \(\beta\)’s less than 1.

(b) We can compute \(\alpha\) as the following:

Now that we have \(\beta\) we can estimate \(\alpha\) for each of our assets in our portfolio.

Recall, that alpha is defined as;

\[\alpha = \overline{r_{i}} - \beta_{i}*\overline{f}\]

Where \(\overline{r_{i}}\) is the average returns of our individual assets in our portfolio, \(\beta_{i}\) is the beta of each asset over our sample period and \(\overline{f}\) is the average returns of SPY. Therefore, we can compute \(\alpha\) as;

alpha <- colMeans(portfolio_returns) - beta*colMeans(SPY_returns)

The alpha or \(\alpha\) for our assests, looks like:

adjusted
AAPL 0.0004068
HOG -0.0011516
NVDA 0.0001393
MSFT 0.0007480
AMD 0.0005696
F -0.0009272

There are other ways to calculate \(\beta\) and \(\alpha\) but for a single factor model we can use the CAPM.beta and CAPM.alpha from the PerformanceAnalytics package.

library(PerformanceAnalytics)
CAPM.beta(Ra = portfolio_returns, Rb = SPY_returns, Rf = 0)
##                    AAPL       HOG     NVDA     MSFT      AMD         F
## Beta: adjusted 1.272256 0.8957108 1.968167 1.422158 1.854408 0.8626349
##                       GE      MMM      INTL     AMZN
## Beta: adjusted 0.8177459 1.065506 0.8593232 1.576654
CAPM.alpha(Ra = portfolio_returns, Rb = SPY_returns, Rf = 0)
##                        AAPL          HOG         NVDA         MSFT          AMD
## Alpha: adjusted 0.000406813 -0.001151611 0.0001393021 0.0007480208 0.0005695635
##                             F           GE           MMM          INTL
## Alpha: adjusted -0.0009272497 -0.002875334 0.00001141014 -0.0003782044
##                        AMZN
## Alpha: adjusted 0.001047154

Which is much simpler than what we just did. The Ra is the asset returns, Rb is the market returns and Rf is the risk-free rate of return.

Upon investigating the CAPM.beta functions from the PerformanceAnalytics package I noticed they had functions for CAPM.beta.bull and CAPM.beta.bear, so I wanted to see how these would look like plotted for each asset.

rbind(
  CAPM.beta.bull(Ra = portfolio_returns, Rb = SPY_returns, Rf = 0),
  CAPM.beta(Ra = portfolio_returns, Rb = SPY_returns, Rf = 0),
  CAPM.beta.bear(Ra = portfolio_returns, Rb = SPY_returns, Rf = 0)
) %>% 
  as_tibble(rownames = NA) %>% 
  rownames_to_column("betas") %>% 
  pivot_longer(2:ncol(.)) %>% 
  ggplot(aes(x = name, y = value, color = betas)) +
  geom_point(size = 5, shape = 19) +
  scale_color_brewer(type='seq', palette='Reds') +
  theme_tq() +
  ggtitle("Asset Betas:", subtitle = "Bear, Beta, Bull")

Interesting…

Visualising the covariance matrix of the returns

In order to visualise the the covariance matrix some further formulas are needed.

Ultimately we want to compute the following;

\[\hat{\Sigma} = var(y)\hat{\beta} \hat{\beta}^T + \hat{\psi}\] Where; \[\hat{\psi} = diag(\hat{\sigma}_{1}^2, ... \hat{\sigma}_{i}^2, ... \hat{\sigma}_{N}^2)\]

We use our \(\beta\) and \(\alpha\) results from before as well as \(N\). I created a function which takes in the \(asset\) and computes the residules and Sigma values. What we are calculating here is the following;

err \[\hat{\epsilon_{i}} = x_{i} - \alpha_{i} - \beta_{i}*y\]

Where \(i = 1.,..., N\)

Sigma \[\hat{\sigma}_{i}^2 = \dfrac{1} {N-2}\hat{\epsilon}_{i}^N\epsilon_{i}\]

The code in base R for the above equations is:

beta <- cov(portfolio_returns, SPY_returns) / as.numeric(var(SPY_returns))
alpha <- colMeans(portfolio_returns) - beta*colMeans(SPY_returns)
N = length(SPY_returns)

mySigmaFunction <- function(ASSET){
  err = portfolio_returns[, ASSET] - alpha[ASSET ,] - beta[ASSET ,] * SPY_returns
  Sigma = (1 / (N - 2)) %*% t(err) %*% err
  return(Sigma)
}

mySigmaFunction("AAPL")                               # For Apple
##              AAPL
## [1,] 0.0001178595
mySigmaFunction("F")                                  # For Ford
##                 F
## [1,] 0.0001599711
sapply(colnames(portfolio_returns), mySigmaFunction)  # For all assets in the portfolio
##          AAPL           HOG          NVDA          MSFT           AMD 
## 0.00011785947 0.00021461371 0.00056144410 0.00006590872 0.00126760753 
##             F            GE           MMM          INTL          AMZN 
## 0.00015997110 0.00035027461 0.00007165307 0.00019184188 0.00017665040

Now that we have the \(\sigma\) values. we want to create a matrix with the Sigma values down the digaonal.

Sigma2 <- sapply(colnames(portfolio_returns), mySigmaFunction)
diag_Sigma2 <- diag(Sigma2)
colnames(diag_Sigma2) = colnames(portfolio_returns)
rownames(diag_Sigma2) = colnames(portfolio_returns)

Which creates our diagonal matrix as follows:

AAPL HOG NVDA MSFT AMD F GE MMM INTL AMZN
AAPL 0.0001179 0.0000000 0.0000000 0.0000000 0.0000000 0.00000 0.0000000 0.0000000 0.0000000 0.0000000
HOG 0.0000000 0.0002146 0.0000000 0.0000000 0.0000000 0.00000 0.0000000 0.0000000 0.0000000 0.0000000
NVDA 0.0000000 0.0000000 0.0005614 0.0000000 0.0000000 0.00000 0.0000000 0.0000000 0.0000000 0.0000000
MSFT 0.0000000 0.0000000 0.0000000 0.0000659 0.0000000 0.00000 0.0000000 0.0000000 0.0000000 0.0000000
AMD 0.0000000 0.0000000 0.0000000 0.0000000 0.0012676 0.00000 0.0000000 0.0000000 0.0000000 0.0000000
F 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.00016 0.0000000 0.0000000 0.0000000 0.0000000
GE 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.00000 0.0003503 0.0000000 0.0000000 0.0000000
MMM 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.00000 0.0000000 0.0000717 0.0000000 0.0000000
INTL 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.00000 0.0000000 0.0000000 0.0001918 0.0000000
AMZN 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.00000 0.0000000 0.0000000 0.0000000 0.0001767

Now that we have our diagonal matrix \(Diag(\psi)\) we can compute the covariance matrix of the returns using: \[\Sigma = var(y)\beta \beta^T + \psi\]

Then use the cov2cor function from the stats package.

Sigma <- as.numeric(var(SPY_returns)) * beta %*% t(beta) + diag_Sigma2
corrplot(cov2cor(Sigma), main = "Covariance matrix of Portfolio Assets")

The correlation table looks like:

AAPL HOG NVDA MSFT AMD F GE MMM INTL AMZN
AAPL 1.0000000 0.3097538 0.3891498 0.5677312 0.2714307 0.3373737 0.2330694 0.4966757 0.3133839 0.4821634
HOG 0.3097538 1.0000000 0.2515805 0.3670311 0.1754765 0.2181079 0.1506764 0.3210946 0.2025988 0.3117126
NVDA 0.3891498 0.2515805 1.0000000 0.4611084 0.2204547 0.2740132 0.1892978 0.4033975 0.2545288 0.3916107
MSFT 0.5677312 0.3670311 0.4611084 1.0000000 0.3216216 0.3997582 0.2761668 0.5885171 0.3713324 0.5713213
AMD 0.2714307 0.1754765 0.2204547 0.3216216 1.0000000 0.1911233 0.1320346 0.2813684 0.1775330 0.2731471
F 0.3373737 0.2181079 0.2740132 0.3997582 0.1911233 1.0000000 0.1641118 0.3497257 0.2206640 0.3395071
GE 0.2330694 0.1506764 0.1892978 0.2761668 0.1320346 0.1641118 1.0000000 0.2416026 0.1524423 0.2345432
MMM 0.4966757 0.3210946 0.4033975 0.5885171 0.2813684 0.3497257 0.2416026 1.0000000 0.3248576 0.4998164
INTL 0.3133839 0.2025988 0.2545288 0.3713324 0.1775330 0.2206640 0.1524423 0.3248576 1.0000000 0.3153656
AMZN 0.4821634 0.3117126 0.3916107 0.5713213 0.2731471 0.3395071 0.2345432 0.4998164 0.3153656 1.0000000

ETF analysis and a randomly sampled portfolio

Analysing my randomly selected portfolio and a series of U.S. funds

Since factor modelling is all about risk and portfolio analysis I thought it would be interesting to compare the performance of a number of U.S. Exchnage Traded Funds (ETFs) and a randomly selected portfolio of assets from the SP500.

In order to construct the randomly created portfolio I first scraped the wikipedia page which contains the list of SP500 firms along with their ticker symbol, I filter out all Class A, B and C shares since a few firms have multiple asset classes and I don’t want to sample two of the same asset.

Just as before I collect the data and put it into time series format.

start_date <- "2016-01-01"
end_date <- "2017-12-31"

set.seed(4746)
url <- "https://en.wikipedia.org/wiki/List_of_S%26P_500_companies"

symbols <- url %>%
  read_html() %>%
  html_nodes(xpath = '//*[@id="constituents"]') %>% 
  html_table() %>% 
  .[[1]] %>% 
  filter(!str_detect(Security, "Class A|Class B|Class C")) %>%     # Removes firms with Class A shares
  sample_n(20) %>% 
  pull(Symbol)

asset_prices <- tq_get(
  symbols,
  from = start_date,
  to = end_date
) %>%
  select(symbol, date, adjusted) %>% 
  pivot_wider(names_from = symbol, values_from = adjusted) %>% 
  tk_xts(date_var = date)

Which looks like:

MMM MNST TJX TMUS NWL INTC EIX XRX GPN HES TIF IR GRMN DRI GLW ZTS CCI BK
132.8015 48.11333 33.27088 38.95 38.54331 30.45350 51.77293 24.64223 62.47044 44.85172 68.47929 51.15904 30.85206 55.28202 16.24429 45.97239 73.96610 36.90321
133.3804 48.69000 33.79228 40.22 38.52557 30.31014 51.79048 24.59438 63.43876 44.74931 68.27712 50.61163 30.96944 56.28635 16.23521 46.69207 75.48198 36.76471
130.6940 48.76667 32.92327 40.05 37.46084 29.63818 51.44798 24.28336 63.63840 41.95655 67.00899 49.53537 30.12268 55.86901 15.87241 46.70180 74.84963 35.84144
127.5101 48.40000 32.87630 40.51 35.78388 28.52719 51.19327 23.61348 61.25259 40.60672 65.17110 48.06017 29.25916 55.47833 15.50961 45.28187 72.98727 34.93664
127.0759 48.08333 31.43894 39.88 34.94097 28.23153 51.18450 22.94359 60.07465 40.56948 62.72673 47.60556 28.12736 55.06099 15.55497 44.62054 71.96515 34.29958
127.0488 48.68000 31.84760 39.68 34.91435 28.72431 51.99249 22.48903 58.41755 38.81003 62.34076 47.76327 27.83393 55.01660 15.70009 43.35622 72.49354 34.54886

Next, I collect the U.S. ETFs. I scraped a website (which I cannot remember the name) for the symbols of the ETFs.

myETFs <- c("SPY", "IVV", "VTI", "VOO", "QQQ", "VEA", "IEFA", "AGG", "VWO",
            "EFA","IEMG","VTV", "IJH", "IWF","BND", "IJR", "IWM", "VUG",
            "GLD", "IWD", "VIG", "VNQ", "USMV", "LQD", "VO", "VYM", "EEM",
            "VB", "VCSH", "XLF", "VCIT", "VEU", "XLK", "ITOT", "IVW", "BNDX",
            "VGT", "DIA", "BSV", "SHV", "IWB", "IWR", "TIP", "SCHF", "MBB", "SDY",
            "MDY", "SCHX", "IEF", "HYG", "DVY", "XLV", "SHY", "IXUS", "TLT", "IVE",
            "PFF", "IAU", "VXUS", "RSP", "SCHB", "VV", "GOVT", "EMB", "MUB", "QUAL",
            "XLY", "VBR", "EWJ", "XLP", "VGK", "SPLV", "MINT", "BIV", "IGSB", "EFAV",
            "VT", "GDX", "XLU", "IWS", "XLI", "SCHD", "IWP", "ACWI", "VMBS", "XLE", "JNK",
            "VOE", "FLOT", "IWV", "JPST", "SCZ", "IEI", "IWN", "DGRO", "VBK", "IGIB", "IWO")
            

etf_prices <- tq_get(
  myETFs,
  from = start_date,
  to = end_date
) %>% 
  select(symbol, date, adjusted) %>% 
  pivot_wider(names_from = symbol, values_from = adjusted) %>% 
  tk_xts(date_var = date)

The data looks like:

SPY IVV VTI VOO QQQ VEA IEFA AGG VWO EFA IEMG VTV IJH IWF BND IJR IWM VUG GLD IWD VIG VNQ USMV LQD VO VYM EEM VB VCSH XLF VCIT VEU XLK ITOT IVW BNDX VGT DIA BSV SHV IWB IWR TIP SCHF MBB SDY MDY SCHX IEF HYG DVY XLV SHY IXUS TLT IVE PFF IAU VXUS RSP SCHB VV GOVT EMB MUB QUAL XLY VBR EWJ XLP VGK SPLV MINT BIV IGSB EFAV VT GDX XLU IWS XLI SCHD IWP ACWI VMBS XLE JNK VOE FLOT IWV JPST SCZ IEI IWN DGRO VBK IGIB IWO
186.8362 187.3943 95.69553 171.0293 105.7754 32.31758 48.12894 97.70714 28.82161 51.85117 35.29821 73.21329 129.8436 93.32362 72.46418 51.13569 104.60918 99.84885 102.89 88.57166 71.16028 67.15640 38.30936 100.1451 112.0916 58.70417 29.25530 103.10962 71.86420 13.61419 73.99710 38.23042 39.89517 42.61963 107.9444 48.90886 101.77430 156.9451 74.18727 105.2681 104.11566 37.17856 101.8868 24.86581 97.72220 63.99211 238.7614 44.54472 98.23306 65.49518 65.93277 66.68895 80.17432 44.14714 110.3894 80.04479 31.46391 10.38 39.78462 71.26929 45.05346 85.64318 23.57018 87.65514 101.0075 59.58526 72.85894 90.46735 45.07748 45.11800 43.21453 34.96312 93.40020 75.13937 48.21395 57.09138 51.90483 13.86359 38.17657 62.81808 48.48287 34.12284 87.50332 50.81642 48.00686 53.82844 80.80476 78.58987 46.91737 110.6951 NA 45.01355 115.1113 84.09489 23.20199 115.5700 47.86354 131.6090
187.1522 187.7839 95.90974 171.3355 105.5919 32.28184 48.04810 97.75242 28.88509 51.77047 35.38096 73.44967 129.9475 93.46677 72.55397 51.29261 104.83739 99.86796 103.18 88.74566 71.39217 68.45230 38.56908 100.1978 112.2051 58.99791 29.32071 103.16643 71.84602 13.66662 74.01466 38.14082 39.79135 42.69417 108.1532 48.88116 101.28743 157.0367 74.17794 105.2777 104.32082 37.26314 101.8220 24.83831 97.79476 64.25666 238.8375 44.63781 98.20529 65.61782 66.33069 67.00954 80.12685 44.10166 109.9440 80.26437 31.49616 10.40 39.71283 71.43853 45.09076 85.81048 23.54203 87.88744 101.1264 59.74447 72.76409 90.67188 45.64425 45.40763 42.95955 35.24865 93.40020 75.22964 48.19091 57.19847 51.91398 13.79471 38.45059 62.96579 48.61266 34.31977 87.66763 50.86277 48.09793 54.03376 80.97237 78.76543 46.90808 110.9662 NA 44.89528 115.1582 84.47743 23.28449 115.6185 47.92159 131.6866
184.7914 185.3255 94.62438 169.1641 104.5776 31.69214 47.20373 98.12343 28.36815 50.91853 34.74654 72.40415 128.1251 92.48373 72.87716 50.70773 103.23986 98.76987 104.67 87.39035 70.52956 68.26473 38.32790 100.5933 110.1342 58.22349 28.76008 101.62282 71.87331 13.45691 74.25190 37.48678 39.30056 42.08852 106.9669 49.06577 99.91275 154.7919 74.26179 105.2586 102.94081 36.64525 102.1459 24.37987 98.11230 63.51594 235.4678 44.06064 98.79820 65.61782 65.65863 66.46268 80.17432 43.33769 111.4255 79.06572 31.42361 10.57 39.03973 70.13145 44.53121 84.68593 23.63587 87.87912 101.5744 59.03268 72.05276 89.25890 44.85076 45.25377 42.24736 34.93548 93.40947 75.53665 48.22778 56.64514 51.14435 14.02101 38.37987 61.87635 47.86177 33.90801 86.25658 50.09352 48.20723 51.95383 80.90055 77.24093 46.90808 109.4701 NA 44.14021 115.4954 83.48843 22.96364 113.8152 47.99304 129.3380
180.3579 180.8912 92.30511 165.0441 101.3029 31.09350 46.30547 98.11440 27.45217 49.86035 33.69835 70.71315 124.8108 90.12631 72.88611 49.40485 100.48224 96.22995 106.15 85.32989 69.23102 66.93474 37.70643 100.6724 107.6568 56.96841 27.87242 98.88598 71.90975 13.07825 74.29585 36.68042 38.13966 41.07753 104.2529 48.99192 96.69561 151.1452 74.37366 105.2681 100.43254 35.80653 102.0626 23.93059 98.23930 62.23734 229.4516 43.00870 99.01125 65.19263 64.49137 65.11437 80.20277 42.43728 111.6255 77.26316 31.22207 10.72 38.23201 68.54225 43.44009 82.62268 23.68278 87.46435 101.7207 57.62780 70.57320 87.00002 44.17064 44.71072 41.46484 34.35522 93.40020 75.63598 48.24622 55.87762 49.95324 14.64089 38.12353 60.51917 46.56396 33.20084 84.22703 48.92575 48.27098 50.68622 80.46945 75.47622 46.91737 106.8519 NA 43.46700 115.7578 81.26778 22.53279 110.6160 48.06894 125.7083
178.3782 178.8873 91.25257 163.2903 100.4722 30.70930 45.71261 98.33157 27.20731 49.25951 33.37653 69.87674 123.1961 89.22913 72.97586 48.58223 98.75159 95.33240 105.68 84.23099 68.51684 66.03955 37.32612 100.7515 106.2667 56.32751 27.57342 97.51283 71.91882 12.87435 74.32222 36.24140 37.83765 40.59300 103.2470 48.98268 95.86506 149.5692 74.45753 105.2777 99.38823 35.36956 102.1459 23.63719 98.34815 61.63771 226.3864 42.52462 99.26135 65.02093 64.04923 64.13377 80.25970 41.90978 112.1254 76.28411 31.31075 10.66 37.78327 67.70534 42.94580 81.73975 23.71093 87.38972 101.7664 56.97219 69.81445 85.77296 43.22602 44.36678 41.10435 34.02365 93.39094 75.83461 48.27848 55.38676 49.36685 14.28668 38.10586 59.79903 46.09118 32.84278 83.17358 48.36041 48.30741 50.03457 80.18208 74.31207 46.88947 105.6269 NA 42.88478 115.9920 79.95219 22.25777 108.9970 48.12699 123.5344
178.5548 179.0729 91.19671 163.3088 100.7813 30.84332 45.87430 98.05104 27.08941 49.46577 33.33975 69.83128 122.7901 89.27684 72.67066 48.58699 98.32368 95.42790 104.74 84.19437 68.66522 66.41467 37.38177 100.4263 105.7561 56.38982 27.56408 97.03933 71.98254 12.89183 74.19040 36.31308 38.08304 40.58370 103.5317 48.92731 96.33284 150.1098 74.44821 105.2681 99.32295 35.24034 101.6092 23.71054 98.25743 61.62007 225.7010 42.51531 98.94638 64.92279 64.19955 63.37006 80.28816 41.99163 110.8984 76.18345 31.14146 10.57 37.86405 67.48906 42.91782 81.75836 23.67340 87.24041 101.6019 57.05649 70.36455 85.55912 43.52830 44.78313 41.22744 34.09733 93.41876 75.65400 48.26467 55.53847 49.39434 13.69632 38.30916 59.57745 46.11899 32.90544 82.78702 48.41602 48.23457 48.96336 80.06235 74.02564 46.90808 105.5428 NA 42.95756 115.8515 79.92420 22.32194 108.0954 48.04214 122.4960

I also collect the SPY500.

SPY_prices <- tq_get(
  "^GSPC",
  from = start_date,
  to = end_date
) %>% 
  select(date, adjusted) %>% 
  tk_xts(date_var = date)

Which looks like:

adjusted
2012.66
2016.71
1990.26
1943.09
1922.03
1923.67

Now, we have a series of 3 data sets, the daily prices of the adjusted close for the randomly selected assets from the SPY500 wikipedia page, the ETFs and the SPY500. Next I calculate the daily returns.

asset_returns <- diff(log(asset_prices), na.pad = FALSE)
etf_returns <- diff(log(etf_prices), na.pad = FALSE)
SPY_returns <- diff(log(SPY_prices), na.pad = FALSE)

The asset_returns looks like:

MMM MNST TJX TMUS NWL INTC EIX XRX GPN HES TIF IR GRMN DRI GLW ZTS CCI BK
0.0043496940 0.011914276 0.015549833 0.032085611 -0.0004605231 -0.004718686 0.0003390193 -0.001943595 0.015381467 -0.0022860685 -0.002956558 -0.010757820 0.003797292 0.0180042454 -0.0005589373 0.0155331932 0.020287018 -0.003760040
-0.0203468502 0.001573356 -0.026052579 -0.004235761 -0.0280260856 -0.022418948 -0.0066352645 -0.012726538 0.003142188 -0.0644414487 -0.018747883 -0.021494331 -0.027722605 -0.0074420504 -0.0225996450 0.0002084719 -0.008412745 -0.025433678
-0.0246630740 -0.007547151 -0.001427791 0.011420159 -0.0457985304 -0.038205692 -0.0049630049 -0.027973817 -0.038210941 -0.0327010158 -0.027810684 -0.030233211 -0.029085489 -0.0070174571 -0.0231226086 -0.0308759534 -0.025196097 -0.025568744
-0.0034106468 -0.006564266 -0.044704812 -0.015673838 -0.0238374478 -0.010418300 -0.0001713849 -0.028779059 -0.019418177 -0.0009174858 -0.038228556 -0.009504269 -0.039450060 -0.0075510143 0.0029199836 -0.0147124623 -0.014103137 -0.018403085
-0.0002136353 0.012332679 0.012914805 -0.005027688 -0.0007621181 0.017304330 0.0156624749 -0.020011095 -0.027971612 -0.0443372159 -0.006172111 0.003307394 -0.010486917 -0.0008065403 0.0092862449 -0.0287440451 0.007315467 0.007241532
0.0028436862 0.008590726 0.016384277 0.009531073 0.0156331210 0.019154091 0.0033726404 0.013734922 0.000000000 -0.0514319659 0.008367407 0.016185405 -0.001507319 0.0088375907 0.0017316113 0.0087102953 0.004411381 0.007985230

However, I want to assume that I own all of these assets in a single portfolio. So I average over the rows and join the data to the ETFs and call it all_returns.

portfolio_returns <- xts(rowMeans(asset_returns), index(asset_returns))
colnames(portfolio_returns) <- c("myPortfolio")
all_returns <- cbind(portfolio_returns, etf_returns)

The all_returns data looks like the following, where we can now see myPortfolio has been joined with the ETF dataset.

myPortfolio SPY IVV VTI VOO QQQ VEA IEFA AGG VWO EFA IEMG VTV IJH IWF BND IJR IWM VUG GLD IWD VIG VNQ USMV LQD VO VYM EEM VB VCSH XLF VCIT VEU XLK ITOT IVW BNDX VGT DIA BSV SHV IWB IWR TIP SCHF MBB SDY MDY SCHX IEF HYG DVY XLV SHY IXUS TLT IVE PFF IAU VXUS RSP SCHB VV GOVT EMB MUB QUAL XLY VBR EWJ XLP VGK SPLV MINT BIV IGSB EFAV VT GDX XLU IWS XLI SCHD IWP ACWI VMBS XLE JNK VOE FLOT IWV JPST SCZ IEI IWN DGRO VBK IGIB IWO
0.006099968 0.0016898660 0.002077056 0.0022360355 0.0017890863 -0.001736693 -0.001106388 -0.001681088 0.00046332859 0.002200231 -0.001557725 0.0023415111 0.0032234753 0.0007997348 0.0015327343 0.0012383279 0.003064097 0.002179106 0.000191391 0.002814589 0.0019626291 0.003253374 0.019113014 0.006756640 0.0005267369 0.001011874 0.004991256 0.0022334070 0.0005508056 -0.0002531205 0.003843879 0.0002372109 -0.002346225 -0.002605712 0.0017475020 0.001932397 -0.0005666427 -0.004795300 0.0005836581 -0.0001257842 0.00009081159 0.0019685044 0.002272383 -0.0006361631 -0.001106790 0.0007422784 0.0041255033 0.0003189161 0.0020875396 -0.0002827452 0.001870799 0.0060170694 0.004795681 -0.0005922602 -0.001030700 -0.004042725 0.002739571 0.0010243955 0.001924928 -0.001806146 0.002371882 0.0008275628 0.0019515134 -0.001194977 0.00264674400 0.0011761117 0.002668490 -0.001302665 0.002258164 0.012494986 0.006398807 -0.005917852 0.008133639 0.00000000000 0.001200660 -0.0004779842 0.001874008 0.0001763650 -0.00498036 0.0071519848 0.0023486170 0.0026733893 0.005754619 0.001875951 0.0009116517 0.0018953483 0.003807047 0.0020722096 0.002231295 -0.0001980912 0.002446725 NA -0.002630911 0.0004069319 0.0045386533 0.003549552 0.0004194414 0.001211963 0.0005894971
-0.017322911 -0.0126945384 -0.013178202 -0.0134924019 -0.0127544774 -0.009652265 -0.018436329 -0.017729720 0.00378821023 -0.018058472 -0.016592869 -0.0180938802 -0.0143367503 -0.0141237737 -0.0105732344 0.0044445716 -0.011468264 -0.015355426 -0.011056244 0.014337489 -0.0153896498 -0.012156174 -0.002743903 -0.006272724 0.0039388941 -0.018628687 -0.013212984 -0.0193056059 -0.0150753935 0.0003798793 -0.015464153 0.0032002389 -0.017296894 -0.012410734 -0.0142874994 -0.011028538 0.0037696378 -0.013665012 -0.0143976824 0.0011297112 -0.00018126091 -0.0133167477 -0.016720820 0.0031757481 -0.018629431 0.0032416931 -0.0115945095 -0.0142095765 -0.0130144362 0.0060192314 0.000000000 -0.0101836091 -0.008194411 0.0005922602 -0.017474795 0.013385186 -0.015046456 -0.0023059217 0.016213994 -0.017094464 -0.018466064 -0.0124870774 -0.0131916717 0.003978224 -0.00009471654 0.0044202968 -0.011985546 -0.009823880 -0.015706040 -0.017537150 -0.003394215 -0.016716991 -0.008924441 0.00009924540 0.004072733 0.0007647688 -0.009721009 -0.0149360906 0.01627195 -0.0018408588 -0.0174535242 -0.0155668847 -0.012070245 -0.016226286 -0.0152395434 0.0022698272 -0.039253655 -0.0008873999 -0.019544672 0.0000000000 -0.013574412 NA -0.016961509 0.0029245001 -0.0117764103 -0.013875422 -0.0157191560 0.001489930 -0.0179955058
-0.022721172 -0.0242840940 -0.024218220 -0.0248157103 -0.0246568769 -0.031814151 -0.019069875 -0.019212776 -0.00009198023 -0.032821816 -0.021000906 -0.0306311536 -0.0236320930 -0.0262082119 -0.0258207050 0.0001227881 -0.026029826 -0.027074043 -0.026052010 0.014040682 -0.0238600026 -0.018582970 -0.019675179 -0.016347691 0.0007861946 -0.022751878 -0.021792064 -0.0313506787 -0.0273006374 0.0005068469 -0.028542194 0.0005917693 -0.021745295 -0.029984083 -0.0243136677 -0.025700013 -0.0015062361 -0.032729302 -0.0238405527 0.0015052940 0.00009044932 -0.0246679043 -0.023153370 -0.0008155098 -0.018599946 0.0012935879 -0.0203358057 -0.0258820165 -0.0241643775 0.0021541043 -0.006500818 -0.0179376149 -0.020495249 0.0003548886 -0.020995449 0.001793079 -0.023062135 -0.0064343040 0.014091356 -0.020906723 -0.022920923 -0.0248076812 -0.0246652173 0.001982686 -0.00473089487 0.0014396023 -0.024085978 -0.020748172 -0.025632797 -0.015280173 -0.012072667 -0.018696026 -0.016748990 -0.00009924540 0.001314087 0.0003822792 -0.013642270 -0.0235646381 0.04326116 -0.0067016093 -0.0221778810 -0.0274902986 -0.021075943 -0.023810522 -0.0235878969 0.0013215006 -0.024701342 -0.0053430008 -0.023111858 0.0001980912 -0.024207994 NA -0.015369037 0.0022689596 -0.0269583342 -0.018940649 -0.0285117041 0.001580334 -0.0284649716
-0.016273779 -0.0110371371 -0.011139358 -0.0114683030 -0.0106831096 -0.008234532 -0.012433352 -0.012885863 0.00221091926 -0.008959566 -0.012123531 -0.0095957668 -0.0118987200 -0.0130211706 -0.0100044748 0.0012306568 -0.016790854 -0.017373511 -0.009370855 -0.004437547 -0.0129618969 -0.010369474 -0.013464265 -0.010137208 0.0007854976 -0.012995981 -0.011313767 -0.0107854401 -0.0139835513 0.0001261363 -0.015713048 0.0003548290 -0.012040822 -0.007949941 -0.0118655695 -0.009695182 -0.0001885182 -0.008626436 -0.0104818256 0.0011270489 0.00009081159 -0.0104525822 -0.012278941 0.0008155098 -0.012336194 0.0011074564 -0.0096812509 -0.0134486432 -0.0113193358 0.0025227807 -0.002637331 -0.0068795036 -0.015174172 0.0007095865 -0.012508199 0.004468584 -0.012752597 0.0028360806 -0.005612737 -0.011806658 -0.012285245 -0.0114437766 -0.0107437626 0.001188048 -0.00085371766 0.0004493159 -0.011441888 -0.010809474 -0.014204613 -0.021617824 -0.007722146 -0.008731859 -0.009698102 -0.00009916959 0.002622702 0.0006684507 -0.008823294 -0.0118083221 -0.02449108 -0.0004634958 -0.0119706508 -0.0102051133 -0.010843211 -0.012586070 -0.0116223810 0.0007544131 -0.012939813 -0.0035775482 -0.015544198 -0.0005950313 -0.011530313 NA -0.013485184 0.0020213357 -0.0163207962 -0.012280109 -0.0147442744 0.001206870 -0.0174444534
-0.003287114 0.0009894013 0.001036607 -0.0006123565 0.0001135276 0.003072098 0.004354785 0.003530857 -0.00285692501 -0.004342847 0.004178409 -0.0011027594 -0.0006508431 -0.0033014245 0.0005345031 -0.0041909610 0.000097994 -0.004342530 0.001001152 -0.008934590 -0.0004349583 0.002163316 0.005664191 0.001489776 -0.0032326845 -0.004816595 0.001105438 -0.0003388982 -0.0048675476 0.0008856206 0.001356817 -0.0017752166 0.001975840 0.006464163 -0.0002292283 0.002753215 -0.0011312023 0.004867711 0.0036078899 -0.0001251127 -0.00009081159 -0.0006570038 -0.003659859 -0.0052679258 0.003098229 -0.0009228731 -0.0002861645 -0.0030322035 -0.0002189089 -0.0031781230 -0.001510439 0.0023442262 -0.011979547 0.0003544986 0.001951243 -0.011003744 -0.001320438 -0.0054213750 -0.008478619 0.002135701 -0.003199632 -0.0006516613 0.0002276234 -0.001584279 -0.00170998028 -0.0016183255 0.001478523 0.007848505 -0.002496171 0.006968673 0.009340380 0.002990074 0.002163358 0.00029781110 -0.002384484 -0.0002860690 0.002735430 0.0005566762 -0.04220025 0.0053208513 -0.0037123602 0.0006030788 0.001905938 -0.004658536 0.0011492677 -0.0015090848 -0.021642045 -0.0014944173 -0.003861882 0.0003969400 -0.000796942 NA 0.001695691 -0.0012123094 -0.0003501705 0.002878845 -0.0083062241 -0.001764683 -0.0084416475
0.005140787 0.0080360599 0.007689359 0.0068197139 0.0082617000 0.011531038 0.004047425 0.004688442 0.00221249905 0.003675782 0.004702448 0.0008273783 0.0061003752 0.0038377668 0.0097876198 0.0025911534 0.002638930 0.002993541 0.009163331 -0.005072985 0.0057480147 0.008340049 -0.006697424 0.005197513 0.0022727943 0.004549844 0.005352840 0.0020317774 0.0028262158 0.0000000000 0.007652486 0.0005922597 0.004185365 0.011825830 0.0070922542 0.010122713 0.0013197206 0.011331708 0.0068127609 0.0001251127 0.00009081159 0.0077616649 0.004124727 0.0027283407 0.004244698 0.0015682489 0.0052809133 0.0041247242 0.0078515507 0.0044839921 0.001007182 0.0009635456 0.012714693 0.0001185908 0.003891078 0.014321948 0.006106584 -0.0007773691 -0.004741593 0.002367465 0.004726248 0.0064978364 0.0075872456 0.003165548 -0.00467027771 0.0003596236 0.010125999 0.010992048 0.001411921 -0.005221970 0.005441886 0.008282949 0.005119095 -0.00019864151 0.001431193 -0.0003823817 0.003208727 0.0055494219 -0.02252194 -0.0039301169 0.0004647994 0.0080082458 0.005967129 0.008138341 0.0064871928 0.0001887472 0.002367193 0.0023899644 0.002866649 0.0001980912 0.007238678 NA 0.000000000 0.0018584037 -0.0026886552 0.008587052 0.0051881359 0.001579281 0.0088344163

Next, I compute (just as before) the \(\beta\) and \(\alpha\) of the portfolios. This time using just the CAPM.beta and CAPM.alpha functions in the PerformanceAnalytics package.

library(PerformanceAnalytics)
betas <- CAPM.beta(Ra = all_returns, Rb = SPY_returns, Rf = 0)
alphas <- CAPM.alpha(Ra = all_returns, Rb = SPY_returns, Rf = 0)

beta_alphas <- cbind(t(betas), t(alphas))
colnames(beta_alphas) <- c("beta", "alpha")

I rank the ETF’s by their \(\alpha\) value. We can see the GDX has the highest \(\alpha\) and a negative \(\beta\) which makes sense since its a Goldminers ETF whihc tracks the Arca Gold Miners Index (GDMNTR) which tracks the performance of companies involved in the gold mining industry. As far as I know there are very few Gold mining companies listed in the SPY500. However, this is a nice way to present and rank ETFs by their \(\alpha\) values and see their corresponding \(\beta\) values.

beta_alphas %>% 
  as_tibble(rownames = NA) %>% 
  rownames_to_column("ETFs") %>% 
  arrange(desc(alpha)) %>% 
  head(10) %>% # I only take the top 10 observations since there are many ETF's in the data
  pivot_longer(cols = c("beta", "alpha")) %>% 
  ggplot(aes(x = reorder(ETFs, if_else(name == "alpha", value, -Inf), FUN = max), y = value)) +
  geom_bar(color = "black", fill = "darkred", stat = "identity") +
  scale_color_brewer(type='seq', palette='Reds') + 
  facet_wrap(~name, scales = "free") +
  theme_tq() +
  coord_flip() +
  ggtitle("Highest ranked  alpha stocks", subtitle = "With corresponding beta values") +
  ylab("Alpha / Beta value") +
  xlab("ETF")

We can also rank the \(alpha\) and \(beta\) values by taking their ratio of \(\dfrac{\alpha}{\beta}\) and plot the results.

beta_alphas %>% 
  as_tibble(rownames = NA) %>% 
  rownames_to_column("ETFs") %>% 
  mutate(alpha_divided_beta = alpha/beta) %>% 
  arrange(desc(alpha_divided_beta )) %>% 
  head(10) %>% 
  ggplot(aes(x = ETFs, y = alpha_divided_beta )) +
  geom_bar(color = "black", fill = "darkred", stat = "identity") +
  scale_color_brewer(type='seq', palette='Reds') + 
  theme_tq() +
  coord_flip() +
  ggtitle("Best Performers: Alpha over Beta ranked ETFs") +
  ylab("Alpha / Beta value") +
  xlab("ETF")

Changing a few lines of code and we can plot and see the worst performers with negative \(\alpha\) to \(\beta\) ratios.

beta_alphas %>% 
  as_tibble(rownames = NA) %>% 
  rownames_to_column("ETFs") %>% 
  mutate(alpha_divided_beta = alpha/beta) %>% 
  arrange(alpha_divided_beta) %>% 
  head(10) %>% 
  ggplot(aes(x = ETFs, y = alpha_divided_beta )) +
  geom_bar(color = "black", fill = "darkblue", stat = "identity") +
  scale_color_brewer(type='seq', palette='Reds') + 
  theme_tq() +
  coord_flip() +
  ggtitle("Worst Performers: Alpha over Beta ranked ETFs") +
  ylab("Alpha / Beta value") +
  xlab("ETF")

We can take our ETF \(\beta\) and \(\alpha\) values before and compute \(\sigma\).

beta <- cov(all_returns, SPY_returns) / as.numeric(var(SPY_returns))
alpha <- colMeans(all_returns) - beta*colMeans(SPY_returns)
N = length(SPY_returns)

mySigmaFunction <- function(ASSET){
  err = all_returns[, ASSET] - alpha[ASSET ,] - beta[ASSET ,] * SPY_returns
  Sigma = (1 / (N - 2)) %*% t(err) %*% err
  return(Sigma)
}

Sigma2 <- sapply(colnames(all_returns), mySigmaFunction)
diag_Sigma2 <- diag(Sigma2)
colnames(diag_Sigma2) = colnames(all_returns)
rownames(diag_Sigma2) = colnames(all_returns)

Sigma <- as.numeric(var(SPY_returns)) * beta %*% t(beta) + diag_Sigma2

Finally, I take a random sample of the ETF’s (since there are too many to analyse) and plot the correlations between the ETF’s.

set.seed(1234)
Sigma_subsample <- sample(colnames(Sigma), size = 10)
Sigma_sampled <- Sigma[ncol = Sigma_subsample, nrow = Sigma_subsample, drop = FALSE]
col <- colorRampPalette(c("#BB4444", "#EE9988", "#FFFFFF", "#77AADD", "#4477AA"))
corrplot(cov2cor(Sigma_sampled), method = "color", col = col(200),
         main = "Covariance matrix of ETFs",
         type = "upper", order = "hclust", number.cex = .7,
         addCoef.col = "black",
         tl.col = "black", tl.srt = 90,
         diag = FALSE)

Sharpe Ratio, CAPM and Fama-French Factor Analysis

Using simple plots still doesn’t give us enough information about an ETF, Portfolio or Asset. The Sharpe ratio Sharpe (1966) is a better measure. The Sharpe ratio is a reward-to-variability ratio which allows us to compare the performance of a portfolio compared to a risk-free asset, after adjusting for it’s risk. It takes the differenece between the portfolio’s return and the risk-free return, this is then divided by the standard deviation (which measures the portfolios volatility).

The Sharpe ratio tells us what the additional unit of return we can expect per unit of risk increase. The Sharpe ratio is defined as:

\[S = \dfrac{\overline{d}}{\sigma_{d}}\]

Where \[\widetilde{d} = \widetilde{R}_{F} - \widetilde{R}_{B}\]

The single asset model with just the market factor looks like:

\[ x_t = \alpha + \beta y_{t} + \epsilon_{t}\]

Taking the expected value of \(x\) and \(y\) at time \(t\) we have:

\[E[x_{t}] = \alpha + \beta E[y_{t}]\] With variance:

\[var[x_{t}] = \beta^2var[y_t] + \sigma^2\]

Thus, the Sharpe ratio becomes:

\[\dfrac{E[x_{t}]}{\sqrt{var[x_{t}]}}\]

Recall that \(\sigma = \sqrt{var[x_t]}\), we just plug in the values for \(E[x_{t}]\) and \(var[x_t]\) defined previously and the equation becomes:

\[\dfrac{\alpha + \beta E[y_{t}]}{\sqrt{\beta^2 var[y_t] + \sigma^2}}\] Which after some algebra we obtain:

\[\dfrac{\dfrac{\alpha}{\beta} + E[y_{t}]}{\sqrt{var[y_{t}] + \dfrac{\sigma^{2}}{\beta^{2}}}}\]

Finally we end up with:

\[\dfrac{\dfrac{\alpha}{\beta} + E[y_t]}{\sqrt{(var[y_t])}}\] We can represent the Sharpe ratio as \(\dfrac{\overline{x}}{\sqrt(Diag(\sigma^{2}))}\) where \(\overline{x}\) is the average value of \(x\) over the historic period from \(t=1\) to \(T\) - simply computed as \(\overline{x} = \dfrac{1}{T}\sum_{t=1}^{T} D_{t}\). In R we can simply take colMeans(all_returns) / sqrt(diag(var(all_returns))).

Machine Learning and Clustering

We can cluster our ETF’s based on their \(\beta\), \(\alpha\) and \(sharpe\) values. Why would we want to do this? Perhaps with more variables than the 3 here the model can cluster these firms in a higher dimensional space and thus we can select our ETF’s based on the clusters and use it as a portfolio diversification tool, that is one cluster might contain riskier ETF’s whereas another might contain value stocks or growth stocks. With such few variables here it doesn’t quite work but with more variables we can better classifiy ETFs.

betas <- CAPM.beta(Ra = all_returns, Rb = SPY_returns, Rf = 0)
alphas <- CAPM.alpha(Ra = all_returns, Rb = SPY_returns, Rf = 0)
sharpe <- colMeans(all_returns) / sqrt(diag(var(all_returns)))

beta_alphas_sharpe <- cbind(t(betas), t(alphas), sharpe)
colnames(beta_alphas_sharpe) <- c("beta", "alpha", "sharpe")
beta_alphas_sharpe <- na.omit(beta_alphas_sharpe)
library(factoextra)
kmeans_model <- kmeans(beta_alphas_sharpe, centers = 5, iter.max = 10, nstart = 1)
fviz_cluster(kmeans_model, data = beta_alphas_sharpe)

The beta_alphas_sharpe data looks like:

beta alpha sharpe
myPortfolio 1.0977199 0.0000345 0.0834437
SPY 0.9990763 0.0000799 0.0988232
IVV 0.9923880 0.0000872 0.1000217
VTI 1.0282825 0.0000718 0.0969068
VOO 0.9978598 0.0000848 0.0995858
QQQ 1.1472461 0.0000931 0.0870931

We can plot the \(\beta\), \(\alpha\), and \(sharpe\) values in a 3D plot and colour them according to the clusters from the kmeans model. It also gives me the opportunity to use the threejs package which we can interact with.

#library(rgl)
#options(rgl.printRglwidget = TRUE)
#plot3d(beta_alphas_sharpe, col = kmeans_model$cluster, size=2, type='s')

library(threejs)
scatterplot3js(beta_alphas_sharpe, color = factor(kmeans_model$cluster), size = 0.5, bg = "black")

Factor Modelling Fama and French

Finally, I analyse the performance of the various ETF’s. The CAPM formula tries to explain the performance of a portfolio through a sigle factor, the market as a whole. CAPM is defined as follows:

\[R_{a} = R_{rf} + \beta_{a}(R_{m} - R_{rf})\]

We can take the model further by adding Factors to the model. Thus, the 3 Factor model looks like:

\[R_{a} = R_{rf} + \beta_{a}(R_{m} - R_{rf}) + \beta_{b}(SMB) + \beta_{c} HML\]

Which takes the market factor from the CAPM and adds 2 new factors, the \(SMB\) and \(HML\) or Small-minus-Big and High-minus-Low. The \(SMB\) tries to caputre the size effect, small market cap companies should outperform the market over the long run. The \(HML\) aims to capture the value verses growth effect by sorting assests based on high book-to-market ratios (defined at the begining of the post). Firms which typically have a high book-to-market ratio are value stocks and low book-to-market ratios are growth stocks, the literature also shows that value stocks outperform growth stocks over the long term horizon.

The Fama French factor model allows us to analyse fund managers performance, since, if an asset managers portfolio can be explained by the Fama French factors then that fund manager has not added any value through portfolio selection and skill and subseuently obtained no alpha.

More formally, our equation becomes:

\[X^T = a^T + B F^T + E^T\]

Where \(F\) is a matrix of factors and \(B\) is a scalar of \(\beta\)’s such that \(B = [{b_{1}, b_{2}, b_{3}]}\), in which \(b_{1}\) is the market \(\beta\), \(b_{2}\) is the \(SMB\) \(\beta\) and \(b_{3}\) is the \(HML\) \(\beta\). More formally extending to include more factors up to \(N\) we have; \(B = [\beta_{1}, ... , \beta_{N}]^T\)

We want to fit a line which minimises the squared errors such that; \(minimise, \beta\) \(|X^T - a^T-BF^T|_{F}^2\)

The solution becomes \([a, B] = X^T \widetilde{F}(\widetilde{F}^T\widetilde{F})^{-1}\) which we can solve in R using the following:

(For completness I re-download the data)

  1. I download the data as before randomly selecting from Wikipedia and convert the daily prices to daily returns - (I set a seed such that we collect the same data using set.seed).
library(quantmod)
library(tidyquant)
library(timetk)
library(rvest)

start_date <- "2016-01-01"
end_date <- "2019-11-15"


set.seed(4746)
url <- "https://en.wikipedia.org/wiki/List_of_S%26P_500_companies"

symbols <- url %>%
  read_html() %>%
  html_nodes(xpath = '//*[@id="constituents"]') %>% 
  html_table() %>% 
  .[[1]] %>% 
  filter(!str_detect(Security, "Class A|Class B|Class C")) %>%     # Removes firms with Class A shares
  sample_n(20) %>% 
  pull(Symbol)


asset_returns <- tq_get(
  symbols,
  from = start_date,
  to = end_date
) %>% 
  group_by(symbol) %>% 
  tq_transmute(
    select = adjusted,
    mutate_fun = periodReturn,
    period = "daily",
    type = "arithmetic"
  ) %>% 
  select(symbol, date, daily.returns) %>% 
  pivot_wider(names_from = symbol, values_from = daily.returns) %>% 
  .[-1, ]
  1. I download the ETFs just as before and convert to daily returns.
myETFs <- c("SPY", "IVV", "VTI", "VOO", "QQQ", "VEA", "IEFA", "AGG", "VWO",
            "EFA","IEMG","VTV", "IJH", "IWF","BND", "IJR", "IWM", "VUG", 
            "GLD", "IWD", "VIG", "VNQ", "USMV", "LQD", "VO", "VYM", "EEM",
            "VB", "VCSH", "XLF", "VCIT", "VEU", "XLK", "ITOT", "IVW", "BNDX",
            "VGT", "DIA", "BSV", "SHV", "IWB", "IWR", "TIP", "SCHF", "MBB", "SDY",
            "MDY", "SCHX", "IEF", "HYG", "DVY", "XLV", "SHY", "IXUS", "TLT", "IVE",
            "PFF", "IAU", "VXUS", "RSP", "SCHB", "VV", "GOVT", "EMB", "MUB", "QUAL",
            "XLY", "VBR", "EWJ", "XLP", "VGK", "SPLV", "MINT", "BIV", "IGSB", "EFAV",
            "VT", "GDX", "XLU", "IWS", "XLI", "SCHD", "IWP", "ACWI", "VMBS", "XLE", "JNK",
            "VOE", "FLOT", "IWV", "JPST", "SCZ", "IEI", "IWN", "DGRO", "VBK", "IGIB", "IWO")


etf_returns <- tq_get(
  myETFs,
  from = start_date,
  to = end_date
) %>% 
  group_by(symbol) %>% 
  tq_transmute(
    select = adjusted,
    mutate_fun = periodReturn,
    period = "daily",
    type = "arithmetic"
  ) %>% 
  select(symbol, date, daily.returns) %>% 
  pivot_wider(names_from = symbol, values_from = daily.returns) %>% 
  .[-1, ]
  1. I take the average daily return of the randomly selected stocks and join the data with the ETFs and set the data as a time series object. I also download the daily Fama French 3 factors from Kenneth French’s website and clean the data up a little.
portfolio_and_etfs <- asset_returns %>% 
  mutate(myPortfolio = rowMeans(select(., -date), na.rm = TRUE)) %>% 
  select(date, myPortfolio) %>% 
  inner_join(etf_returns, by = c("date")) %>% 
  tk_xts(date_var = date)

# Daily fama french
library(glue)
library(readr)
temp <- tempfile()
base <- "http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/"
factor <- "F-F_Research_Data_Factors_daily"
format<-"_CSV.zip"
full_url <-glue(base,factor,format,sep ="")
download.file(full_url,temp,quiet = TRUE)

North_America_3_Factors <- read_csv(unz(temp, "F-F_Research_Data_Factors_daily.CSV"), skip = 4) %>%
  rename(date = X1) %>% 
  mutate(date = ymd(date)) %>% 
  drop_na(date) %>%
  rename(Mkt_Rf_3 = `Mkt-RF`,
         SMB_3 = SMB,
         HML_3 = HML,
         RF_3 = RF) %>% 
  select(-RF_3) %>% 
  mutate_at(vars(c("Mkt_Rf_3", "SMB_3", "HML_3")), funs(./100)) %>% 
  tk_xts(date_var = date)

Finally we can compute the solution \([a, B] = X^T \widetilde{F}(\widetilde{F}^T\widetilde{F})^{-1}\)

three_factors <- North_America_3_Factors[index(North_America_3_Factors) %in% c(min(index(portfolio_and_etfs)):max(index(North_America_3_Factors)))] # this ensures that the dates from the both time series match
portfolio_and_etfs_new <- portfolio_and_etfs[index(portfolio_and_etfs) %in% c(min(index(three_factors))):max(index(three_factors))] # There is probably a much nicer way to go about doing this but this works...
FF_3Factors <- cbind(alpha = 1, three_factors)
Gamma <- t(solve(t(FF_3Factors) %*% FF_3Factors, t(FF_3Factors) %*% portfolio_and_etfs_new))

Which we can see as:

alpha Mkt_Rf_3 SMB_3 HML_3
myPortfolio 0.0001697 0.9844870 0.0155642 0.0629606
SPY 0.0000391 0.9784199 -0.1331473 -0.0085326
IVV 0.0000423 0.9771912 -0.1341942 -0.0113980
VTI 0.0000440 0.9727841 -0.0164012 -0.0019078
VOO 0.0000430 0.9769683 -0.1356460 -0.0099922
QQQ 0.0000440 1.1645777 -0.1440829 -0.4955734

Instead, we can estimate these using linear regression models. For my random portfolio myPortfolio we can use the lm function to fit a linear model and then use the tidy function from the broom package to tidy the output up a little:

library(broom)
regdata <- cbind(FF_3Factors, portfolio_and_etfs_new)
#sapply(regdata, function(x) sum(is.na(x)))   # One column was causing problems since it consisted of just NA values. I remove it below.
regdata$JPST <- NULL      # This ETF failed to download or does not exist on the Yahoo Finance site

myPortfolioReg <- lm(myPortfolio ~ Mkt_Rf_3 + SMB_3 + HML_3, data = regdata, na.action = na.exclude)
tidy(myPortfolioReg)
## # A tibble: 4 x 5
##   term        estimate std.error statistic p.value
##   <chr>          <dbl>     <dbl>     <dbl>   <dbl>
## 1 (Intercept) 0.000170  0.000112     1.51  0.131  
## 2 Mkt_Rf_3    0.984     0.0136      72.6   0      
## 3 SMB_3       0.0156    0.0226       0.690 0.490  
## 4 HML_3       0.0630    0.0200       3.15  0.00167

We can apply this to all our ETFs in the data using the apply command and applying our own custom lm function.

regressionLists <- apply(regdata, 2, function(y) lm(y ~ Mkt_Rf_3 + SMB_3 + HML_3, data = regdata, na.action = na.exclude))

We can also apply the tidy command to individual ETFs and then use the stars.pval to make the data even more tidy awesome.

library(gtools)
tidy(regressionLists$myPortfolio) %>% 
  mutate(p.value = stars.pval(p.value))
## # A tibble: 4 x 5
##   term        estimate std.error statistic p.value
##   <chr>          <dbl>     <dbl>     <dbl> <chr>  
## 1 (Intercept) 0.000170  0.000112     1.51  " "    
## 2 Mkt_Rf_3    0.984     0.0136      72.6   ***    
## 3 SMB_3       0.0156    0.0226       0.690 " "    
## 4 HML_3       0.0630    0.0200       3.15  **
tidy(regressionLists$VTV)%>% 
  mutate(p.value = stars.pval(p.value))
## # A tibble: 4 x 5
##   term          estimate std.error statistic p.value
##   <chr>            <dbl>     <dbl>     <dbl> <chr>  
## 1 (Intercept)  0.0000514 0.0000446      1.15 " "    
## 2 Mkt_Rf_3     0.918     0.00539      170.   ***    
## 3 SMB_3       -0.131     0.00897      -14.7  ***    
## 4 HML_3        0.265     0.00794       33.4  ***
tidy(regressionLists$LQD) %>% 
  mutate(p.value = stars.pval(p.value))
## # A tibble: 4 x 5
##   term         estimate std.error statistic p.value
##   <chr>           <dbl>     <dbl>     <dbl> <chr>  
## 1 (Intercept)  0.000241 0.0000882      2.74 **     
## 2 Mkt_Rf_3    -0.0119   0.0106        -1.12 " "    
## 3 SMB_3       -0.0325   0.0177        -1.84 .      
## 4 HML_3       -0.157    0.0157       -10.0  ***

Finally, we can apply this same method to all our ETFs using the lapply function to tidy the data and the map function to mutate or convert the p-value to stars. Then take a random sample of 5 ETFs regressions.

library(purrr)
myTidyRegressions <- lapply(regressionLists[4:length(regressionLists)], tidy)
myTidiedRegressions <- map(myTidyRegressions, ~mutate(., p.value = stars.pval(p.value)))
library(rlist)
list.sample(myTidiedRegressions, 5)
## $DGRO
## # A tibble: 4 x 5
##   term         estimate std.error statistic p.value
##   <chr>           <dbl>     <dbl>     <dbl> <chr>  
## 1 (Intercept)  0.000154 0.0000507      3.04 **     
## 2 Mkt_Rf_3     0.890    0.00612      146.   ***    
## 3 SMB_3       -0.145    0.0102       -14.2  ***    
## 4 HML_3        0.121    0.00902       13.4  ***    
## 
## $EFAV
## # A tibble: 4 x 5
##   term           estimate std.error statistic p.value
##   <chr>             <dbl>     <dbl>     <dbl> <chr>  
## 1 (Intercept) -0.00000220  0.000142   -0.0155 " "    
## 2 Mkt_Rf_3     0.572       0.0171     33.5    ***    
## 3 SMB_3       -0.112       0.0285     -3.94   ***    
## 4 HML_3       -0.0172      0.0252     -0.684  " "    
## 
## $XLE
## # A tibble: 4 x 5
##   term         estimate std.error statistic p.value
##   <chr>           <dbl>     <dbl>     <dbl> <chr>  
## 1 (Intercept) -0.000264  0.000270    -0.978 " "    
## 2 Mkt_Rf_3     1.09      0.0326      33.5   ***    
## 3 SMB_3        0.0920    0.0543       1.69  .      
## 4 HML_3        0.716     0.0481      14.9   ***    
## 
## $IVW
## # A tibble: 4 x 5
##   term          estimate std.error statistic p.value
##   <chr>            <dbl>     <dbl>     <dbl> <chr>  
## 1 (Intercept)  0.0000434 0.0000407      1.07 " "    
## 2 Mkt_Rf_3     1.01      0.00491      206.   ***    
## 3 SMB_3       -0.180     0.00817      -22.0  ***    
## 4 HML_3       -0.312     0.00724      -43.1  ***    
## 
## $VEA
## # A tibble: 4 x 5
##   term         estimate std.error statistic p.value
##   <chr>           <dbl>     <dbl>     <dbl> <chr>  
## 1 (Intercept) -0.000113  0.000154    -0.730 " "    
## 2 Mkt_Rf_3     0.852     0.0186      45.8   ***    
## 3 SMB_3       -0.0850    0.0310      -2.74  **     
## 4 HML_3        0.150     0.0274       5.45  ***

A few notes here: We should be modelling the excess returns on the ETFs and not just the ETF returns. This is simple enough by replacing for example myPortfolio in the lm regressions with \((myPortfolio - RF_3)\) where \(RF_3\) is the risk free rate which comes with the Fama and French data. We should also probably use \(NeweyWest\) adjusted standard errors by running coeftest(myPortfolioReg, NeweyWest(myPortfolioReg, lag = N, prewhite = FALSE)). Literature suggests that lags of \(4(N/100)^{2/9}\) should be used, where \(N\) is the number of observations. We could rank the ETFs based on their alphas as before and go long on the high alphas and short on the low alphas. Run our hedge portfolio through a Fama French regression here and see if we are able to obtain better performances.

Finally, here I only used the 3 Factor model. There are more factors from the literature we could use. On the Kenneth French website we can collect data on \(Market\), \(SMB\), \(HML\), \(RMW\), \(CMA\) and \(MOM\). Where the \(RMW\) factor is a profitability factor, the \(CMA\) is an investment factor and the \(MOM\) is a momentum factor.

Below I pull in the daily Fama and French 5 Factor model and plot them.

library(glue)
library(timetk)
library(plotly)
library(pipeR)

temp <- tempfile()
base <- "http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/"
factor <- "North_America_5_Factors"
format<-"_CSV.zip"
full_url <-glue(base,factor,format,sep ="")
download.file(full_url,temp,quiet = TRUE)

North_America_Factors <- read_csv(unz(temp, "North_America_5_Factors.csv"), skip = 6) %>%
  rename(date = X1) %>%
  mutate_at(vars(-date), as.numeric) %>%
  mutate(date = rollback(ymd(parse_date_time(date, "%Y%m") + months(1)))) %>%
  drop_na(date) %>%
  rename(Mkt_Rf = `Mkt-RF`) %>% 
  select(-RF) %>% 
  mutate_at(vars(c("Mkt_Rf", "SMB", "HML", "RMW", "CMA")), funs(./100)) %>% 
  tk_xts(date_var = date)

ax <- list(
  zeroline = TRUE,
  showline = TRUE,
  mirror = "ticks",
  gridcolor = toRGB('white'),
  gridwidth = 2,
  zerolinecolor = toRGB('white'),
  zerolinewidth = 4,
  linecolor = toRGB('white'),
  linewidth = 6,
  color = 'white'
)


North_America_Factors %>>% 
  (cumprod(1+.)) %>>%
  ROC(36, type = "discrete") %>>%
  na.omit() %>>% 
  (
    plot_ly(
      x = colnames(.),
      y = as.Date(index(.)),
      z = data.matrix(.),
      type = "surface",
      colors = c("darkblue", "yellow", "darkred")
    )
  ) %>%
  plotly::layout(
    xaxis = ax, yaxis = ax,
    paper_bgcolor ='rgb(0,0,0)',
    #plot_bgcolor ='rgb(0,0,0)',
    title = "Fama French Factors",
    scene = list(
      xaxis = c(ax, title = "Factors", tickangle = -45),
      yaxis = c(ax, title = "Date", tickangle = -45),
      zaxis = c(ax, title = "Fama French Factor")
    ),
    titlefont = list(
      family = "Agency FB",
      size = 50,
      color = 'white')
  )

Sharpe, William F. 1964. “Capital Asset Prices: A Theory of Market Equilibrium Under Conditions of Risk.” The Journal of Finance 19 (3): 425–42.

———. 1966. “Mutual Fund Performance.” The Journal of Business 39 (1): 119–38.

To leave a comment for the author, please follow the link and comment on their blog: Posts on Matthew Smith R Shenanigans.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)