In “What the hell is a variance matrix?” I talked about the basics of variance matrices and highlighted challenges for estimating them in finance. Here we look more deeply at the most popular estimation technique.
Models for variance matrices
The types of variance estimates that are used in finance can be classified as:
- Sample estimate
- Factor model
- Shrinkage estimate
The Ledoit-Wolf estimate is the leading example of a shrinkage estimate.
Dynamic estimates include multivariate garch estimates and relatives such as DCC.
Note that these are variance matrices of returns not prices.
What are factor models?
A factor model says that there is some number of drivers of the returns of the assets plus some idiosyncratic risk not associated with the drivers. Instead of being called “drivers”, they are called “factors”. Each asset may have its own set of sensitivities to the factors.
In matrix notation a factor model is:
V = B’FB + D
This notation hides a lot of details:
- V is a square of numbers (of size number-of-assets by number-of-assets).
- B is a rectangle of sensitivities of size number-of-factors by number-of-assets.
- F is the variance matrix of the factors (of size number-of-factors by number-of-factors).
- D is a diagonal matrix (all off-diagonal elements are zero) of the idiosyncratic variance of each asset. The total size of D is number-of-assets by number-of-assets, but there are only number-of-assets values that are not zero.
Why factor models?
Factor models serve two key uses:
- Ensure the estimate is positive definite
- Reduce noise
Positive definiteness is a technical condition. But it is highly practical. If the estimate were not positive definite, then there would be one or more portfolios that have zero estimated risk. Of course there are no such portfolios. A non-positive definite variance matrix can be very misleading, especially if it is given to an optimizer.
The second issue is noise. Suppose that your universe has 2000 assets. Then the variance will contain approximately 2 million unique numbers. If you have two years of daily data on your universe, then you only have 1 million numbers in your dataset. Obviously you need a bit of finesse when estimating those 2 million numbers. Factor models are one form of finesse.
Factor models do these two things, but we should still wonder if they do them well.
A very simple example of a factor model is the Capital Asset Pricing Model. Here there is just one factor — the market. The B matrix is just the betas of the assets, the F matrix is really just a number — the market variance, and D is filled with the residual variances of the assets.
I’m unlikely to say much nice about CAPM but this is one of those times. What happens when the CAPM factor model hits a volatile period? The market variance increases, which increases the size of off-diagonal elements of the variance. That is, the correlations will tend to increase.
So this very simple factor model exhibits one of the key features that actually happens.
Types of factor models
Factor models are often divided into:
A more operational classification is:
- factors estimated, sensitivities known
- factors known, sensitivities estimated
- factors estimated, sensitivities estimated
It is possible to have hybrids. There exist models that have all three types.
Both fundamental and macro models use linear regressions. The difference is the type of regressions. Fundamental models use cross sectional regressions. That is, pick one point in time and the observations are the collection of assets. Macro models use time series regressions: pick one asset and the observations are the history of returns for the asset (and the factors).
The time series regressions have an advantage given the way that factor models are used. We often care about the risk of portfolios. When aggregating up to a portfolio, the errors from the time series regressions get a chance to average out. That is not true of the errors from the cross-sectional regressions.
So why use cross-sectional regressions at all? Necessity.
When implementing a regression there are mainly two things to consider:
- What do you know?
- What is (relatively) constant?
If you want to use interest rates or oil price, then you know the history of the factors. You can also assume that the sensitivity of an asset to those factors is basically inherent and stable. So a time series regression makes sense.
Now suppose you are interested in momentum or book-to-price. What you know is the value (sensitivity) that each asset currently has for the factors. But these sensitivities are not static — they are market-driven. In the case of book-to-price there would be no point in looking at it if it were static. For factors like these the constancy that we assume is the market’s (current) pricing of the factor. That is, a cross-sectional regression.
Estimating macro models
Building a macro factor model is little more than performing a series of regressions. An R command to do the regressions (in an overly simple case) might look like:
> sens <- lm(asset.return.matrix ~ factor.changes.history)
This assumes that you want to use least squares regressions as opposed to robust regressions. Returns have long tails so robust regression is a reasonable idea. In the models that I built least squares was almost as good as the best robust regression tried. The best regression was very lightly robust (a Huber M-estimate). More profoundly robust regressions were substantially worse (out-of sample) than least squares.
You can get Huber M-estimates of regression in R with the rlm function in the MASS package.
We are assuming that the sensitivities are constant. That assumption is unlikely to be completely true. So it might be useful to give more weight in the regressions to more recent data. Some people use exponentially decreasing weights — I think that puts too much weight on the most recent data. In my experiments the best weighting scheme was linear decreasing weights. An R command to get those sorts of weights would be:
> weights <- seq(.5, 1.5, length=n.observations)
One more issue is how to treat categorizations of assets such as industry (and country if it is a multi-country model). One approach is to allow sensitivity to only one industry. A more ambitious approach is to allow sensitivity to more than one industry if those sensitivities are statistically significant (at some pre-chosen level).
Estimating statistical models
Some people use the term “implicit factor” rather than statistical factor.
The most common approach to building a statistical factor model is conceptually equivalent to the R command:
> facmod <- eigen(cor(return.matrix))
This is an eigen decomposition of the correlation matrix of the returns. Some number of eigenvectors are selected as the factor sensitivities. Choosing the number of factors is really the key decision in a statistical factor model. Too few factors means you are missing out on systematic risk. Too many factors means you are adding noise.
By the way: these factors are uncorrelated and have variance 1; hence their variance matrix F is the identity and so drops out of the calculations.
The rest of the construction process is really just some book-keeping to see how much idiosyncratic risk there is, and then scaling by the estimated volatilities.
For practical purposes, you want to ensure that the resulting variance matrix is substantially positive definite.
If you are using R, you don’t need to write your own function, you can use factor.model.stat which is in the BurStFin package (which still hasn’t migrated to CRAN).
> install.packages("BurStFin", repos="http://www.burns-stat.com/R")
Note that this package also contains an implementation of the Ledoit-Wolf shrinkage estimate.
Dealing with missing values
If you look at the definition of factor.model.stat, it is substantially more complicated than the explanation above suggests it should be. That is mostly because it handles the possibility of missing values.
In a statistics class you are never going to see a variable included in a variance matrix when there are no data for it. In finance that is a common demand. We want to have new assets in our risk models, often before the assets even trade at all. Stocks for Facebook and Twitter are currently on the horizon.
Of course the estimates for such assets will be less than perfect. We’re hoping for reasonable rather than perfect. The use that will be made of the variance should determine the assumptions used in making the estimates.
Since markets are always full of surprises, we can’t expect that the predicted volatility for a portfolio will always be a good approximation of the realized volatility. But better models will do better at putting different portfolios into the right order in terms of volatility.
Here is one way to test models:
- Pick a time point in the past.
- Generate a set of random portfolios that have constraints you care about.
- For each model and each random portfolio get the predicted volatility.
- For each random portfolio find the subsequent realized volatility.
- For each model calculate the correlation between predicted and realized volatility.
It is best to do this at several time points rather than just one. Higher correlations are better.