I studied Ecology as an undergraduate, which meant I spent a lot of time gathering and analyzing field data. One of the basic tools we used to look for relationships in a large set of variables was correlation and scatterplot matrices. Each of these requires a single line of code in R:
The ‘pairs’ function in R contains a lot of additional options, which can be used to make very informative plots. These options can get a little cumbersome, but fortunately several package authors have written wrapper functions that automatically enable some extra magic. Two such packages are psych and PerformanceAnalytics. I happen to prefer the 1 liner from PerformanceAnalytics, but it’s a matter of personal taste:
This chart contains a LOT of information: On the diagonal are the univariate distributions, plotted as histograms and kernel density plots. On the right of the diagonal are the pair-wise correlations, with red stars signifying significance levels. As the correlations get bigger the font size of the coefficient gets bigger. On the left side of the diagonal is the scatter-plot matrix, with loess smoothers in red to help illustrate the underlying relationship. This is one of my favorite plots in R, because it combines a large amount of information into one command and one easy to follow plot. In fact, this plot contains more information than is revealed by the 1st two commands in this post!
Of course, you can use this command on data from other domains besides Ecology. PerformanceAnalytics is intended for the analysis of financial data, so lets put it through its paces. First we download some financial data (a stock index, a bond index, and a gold index) from yahoo finance using quantmod, and then combine the daily close series of those indexes into one dataframe. I’m not 100% happy with the legend in the plot, but I wanted to show how the correlations between these indexes have changed over the years. I also skipped red (color #2) in the plots and in the legend, because the loess smoother is also red.