[This article was first published on The R Trader » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

When it comes to managing a portfolio of stocks versus a benchmark the problem is very different from defining an absolute return strategy. In the former one has to hold more stocks than in the later where no stocks at all can be held  if there is not good enough opportunity.  The reason for that is the tracking error. This is defined as the standard deviation of  the portfolio return minus the benchmark return. The less stocks is held  vs. a benchmark the higher the tracking error (e.g higher risk).

The analysis that follows is largely inspired by the book  “Active Portfolio Management” by Grinold & Kahn. This is the bible for anyone interested in running a portfolio against a benchmark. I strongly encourage anyone with an interest in the topic to read the book from the beginning to the end.  It’s very well written and lays the foundations of systematic active portfolio management (I have no affiliation to the editor or the authors).

1 – Factor Analysis

Here we’re trying to rank as accurately as possible the stocks in the investment universe on a forward return basis. Many people came up with many tools and countless variant of those tools have been developed to achieve this. In this post I focus on two simple and widely used metrics: Information Coefficient (IC) and Quantiles Return (QR).

1.1 – Information Coefficient

The IC gives an overview of the factor forecasting ability. More precisely, this is a measure of how well the factor ranks the stocks on a forward return basis. The IC is defined as the rank correlation (ρ) between the metric (e.g. factor) and the forward return. In statistical terms the rank correlation is a nonparametric measure of dependance between two variables. For a sample of size n, the n raw scores  are converted to ranks , and ρ is computed from:

The horizon for the forward return has to be defined by the analyst and it’s a function of  the strategy’s turnover and the alpha decay (this has been the subject of extensive research). Obviously ICs must be as high as possible in absolute terms.

For the keen reader, in the book by Grinold & Kahn a formula linking Information Ratio (IR) and IC is given: with breadth being the number of independent bets (trades).  This formula is known as the fundamental law of active management. The problem is that often, defining breadth accurately is not as easy as it sounds.

1.2 – Quantiles Return

In order to have a more accurate estimate of the factor predictive power it’s necessary to go a step further and group stocks by quantile of factor values then analyse the average forward return (or any other central tendency metric) of each of those quantiles. The usefulness of this tool is straightforward. A factor can have a good IC but its predictive power might be limited to a small number of stocks. This is not good as a portfolio manager will have to pick stocks within the entire universe in order to meet its tracking error constraint. Good quantiles return are characterised by a monotonous relationship between the individual quantiles and forward returns.

2 – Data and code

All the stocks in the S&P500 index (at the time of writing). Obviously there is a survival ship bias: the list of stocks in the index has changed significantly between the start and the end of the sample period, however it’s good enough for illustration purposes only.

The code below downloads individual stock prices in the S&P500 between Jan 2005 and today (it takes a while) and turns the raw prices into return over the last 12 months and the last month. The former is our factor, the latter will be used as the forward return measure.

#####################################################################
# Factor Evaluation in Quantitative Portfolio Management
#
# [email protected] - Mar. 2015
#####################################################################
library(tseries)
library(quantmod)
library(XML)

startDate <- "2005-01-01"
tickers <- as.matrix(tables[[1]]["Ticker symbol"])

instrumentRtn <- function(instrument=instrument,startDate=startDate,lag=lag){
price <- get.hist.quote(instrument, quote="Adj", start=startDate, retclass="zoo")
monthlyPrice <- aggregate(price, as.yearmon, tail, 1)
monthlyReturn <- diff(log(monthlyPrice),lag=lag)
monthlyReturn <- exp(monthlyReturn)-1
return(monthlyReturn)
}

dataFactor <- list()
dataRtn <- list()

for (i in 1:length(tickers)) {
print(tickers[i])
dataFactor[[i]] <- instrumentRtn(tickers[i],startDate,lag=12)
dataRtn[[i]] <- instrumentRtn(tickers[i],startDate,lag=1)
}



Below is the code to compute Information Coefficient and Quantiles Return. Note that I used quintiles in  this example but any other grouping method (terciles, deciles etc…) can be used. it really depends on the sample size, what you want to capture and wether you want to have a broad overview or focus on distribution tails.  For estimating returns within each quintile, median has been used as the central tendency estimator. This measure is much less sensitive to outliers than arithmetic mean.

theDates <- as.yearmon(seq(as.Date(startDate), to=Sys.Date(), by="month"))

findDateValue <- function(x=x,theDate=theDate){
pos <- match(as.yearmon(theDate),index(x))
return(x[pos])
}

factorStats <- NULL

for (i in 1:(length(theDates)-1)){
factorValue <- unlist(lapply(dataFactor,findDateValue,theDate=as.yearmon(ii)))
if (length(which(!is.na(factorValue))) > 10){
print(theDates[i])
bucket <- cut(factorValue,breaks=quantile(factorValue,probs=seq(0,1,0.2),na.rm=TRUE),labels=c(1:5),include.lowest = TRUE)
rtnValue <- unlist(lapply(dataRtn,findDateValue,theDate=as.yearmon(theDates[i+1])))

##IC
ic <- cor(factorValue,rtnValue,method="spearman",use="pairwise.complete.obs")

##QS
quantilesRtn <- NULL

for (j in sort(unique(bucket))){
pos <- which(bucket == j)
quantilesRtn <- cbind(quantilesRtn,median(rtnValue[pos],na.rm=TRUE))
}
factorStats <- rbind(factorStats,cbind(quantilesRtn,ic))
}
}

colnames(factorStats) <- c("Q1","Q2","Q3","Q4","Q5","IC")

factorStats <- xts(factorStats,order.by=theDates[1:(length(theDates)-1)])

qs <- apply(factorStats[,c("Q1","Q2","Q3","Q4","Q5")],2,median,na.rm=TRUE)
ic <- round(median(factorStats[,"IC"],na.rm=TRUE),4)


And finally the code to produce the Quantiles Return chart.

par(cex=0.8,mex=0.8)
bplot <- barplot(qs,
border=NA,
col="royal blue",
ylim=c(0,max(q)+0.005),
main="S&P500 Universe n 12 Months Momentum Return - IC and QS")

abline(h=0)

legend("topleft",
paste("Information Coefficient = ",ic,sep=""),
bty = "n")


3 – How to exploit the information above?

In the chart above Q1 is lowest past 12 months return and Q5 highest. There is an almost monotonic increase in the quantiles return between Q1 and Q5 which clearly indicates that stocks falling into Q5 outperform those falling into Q1 by about 1% per month. This is very significant and powerful for such a simple factor (not really a surprise though…). Therefore there are greater chances to beat the index by overweighting the stocks falling into Q5 and underweighting those falling into Q1 relative to the benchmark.

An IC of 0.0206 might not mean a great deal in itself but it’s significantly different from 0 and indicates a good predictive power of the past 12 months return overall. Formal significance tests can be evaluated but this is beyond the scope of this article.

4 – Practical limitations

The above framework is excellent for evaluating investments factor’s quality however there are a number of practical limitations that have to be addressed for real life implementation:

• Rebalancing: In the description above, it’s assumed that at the end of each month the portfolio is fully rebalanced. This means all stocks falling in Q1 are underweight and all stocks falling in Q5 are overweight relative to the benchmark. This is not always possible for practical reasons: some stocks might be excluded from the investment universe, there are constraints on industry or sector weight, there are constraints on turnover etc…
• Transaction Costs: This has not be taken into account in the analysis above and this is a serious brake to real life implementation. Turnover considerations are usually implemented in real life in a form of penalty on factor quality.
• Transfer coefficient: This is an extension of the fundamental law of active management and it relaxes the assumption of Grinold’s model that managers face no constraints which preclude them from translating their investments insights directly into portfolio bets.

And finally, I’m amazed by what can be achieved in less than 80 lines of code with R…

As usual any comments welcome