(This article was first published on Systematic Investor » R, and kindly contributed to R-bloggers)
I want to follow up and provide a bit more details to the excellent “A Visual of Current Major Market Clusters” post by David Varadi.
Let’s first load historical for the 10 major asset classes:
- Gold ( GLD )
- US Dollar ( UUP )
- S&P500 ( SPY )
- Nasdaq100 ( QQQ )
- Small Cap ( IWM )
- Emerging Markets ( EEM )
- International Equity ( EFA )
- Real Estate ( IYR )
- Oil ( USO )
- Treasurys ( TLT )
###############################################################################
# Load Systematic Investor Toolbox (SIT)
# http://systematicinvestor.wordpress.com/systematic-investor-toolbox/
###############################################################################
setInternet2(TRUE)
con = gzcon(url('http://www.systematicportfolio.com/sit.gz', 'rb'))
source(con)
close(con)
#*****************************************************************
# Load historical data for ETFs
#******************************************************************
load.packages('quantmod')
tickers = spl('GLD,UUP,SPY,QQQ,IWM,EEM,EFA,IYR,USO,TLT')
data <- new.env()
getSymbols(tickers, src = 'yahoo', from = '1900-01-01', env = data, auto.assign = T)
for(i in ls(data)) data[[i]] = adjustOHLC(data[[i]], use.Adjusted=T)
bt.prep(data, align='remove.na')
Next let’s use the historical returns over the past year to compute correlations between all asset classes and group assets into 4 clusters:
#*****************************************************************
# Create Clusters
#******************************************************************
# compute returns
ret = data$prices / mlag(data$prices) - 1
ret = na.omit(ret)
# setup period and method to compute correlations
dates = '2012::2012'
method = 'pearson' # kendall, spearman
correlation = cor(ret[dates], method = method)
dissimilarity = 1 - (correlation)
distance = as.dist(dissimilarity)
# find 4 clusters
xy = cmdscale(distance)
fit = kmeans(xy, 4, iter.max=100, nstart=100)
#*****************************************************************
# Create Plot
#******************************************************************
load.packages('cluster')
clusplot(xy, fit$cluster, color=TRUE, shade=TRUE, labels=3, lines=0, plotchar=F,
main = paste('Major Market Clusters over', dates), sub='')
There are 4 clusters: TLT, GLD, UUP, and Equities / Oil / Real Estate. You can see assigned clusters by executing
fit$cluster
This works quite well, but we have a number of things to explore:
- how to select number of clusters
- what correlation measure to use i.e. pearson, kendall, spearman
- what look back to use i.e. 1 month / 6 months / 1 year
- what frequency of data to use i.e daily / weekly / monthly
In the next post I will provide some ideas how to select number of clusters.
To view the complete source code for this example, please have a look at the bt.cluster.visual.test() function in bt.test.r at github.
To leave a comment for the author, please follow the link and comment on his blog: Systematic Investor » R.
R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series,ecdf, trading) and more...


Zero Inflated Models and Generalized Linear Mixed Models with R.
Zuur, Saveliev, Ieno (2012).