[This article was first published on quandl blog » R quandl blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Image by Justin Reznick

“An economist is an expert who will know tomorrow why the things he predicted yesterday didn't happen today.” Laurence J. Peter (author and creator of the Peter Principle)

If you were paying attention to financial sites last month, you probably noticed a number of articles on “Dr. Copper”. Here is just a small sample of some of the headlines:

So what is all the fuss about this so-called Doctor Copper? Well, it turns out that at some point in time Copper price action was believed to be a leading indicator for the overall equities market. Thus, it was said that Copper had a PhD in Economics (one can forgive the puns linking Copper to MD).

This raises a number of questions. Was this actually the case? If so, when was it a valid indicator? When did the signal fall apart? As always, R gives us the ability to see if this mythology is confirmed, plausible or busted.

The process to explore the question will be as follows:

• examination of Copper's price action over time
• Econometric analysis of Copper and US equity index prices, subsetting historical periods
• Granger Causality

Using quandl.com, we can download the time series data we need to get started.

require(Quandl)

## Warning: package 'Quandl' was built under R version 3.0.3

# download monthly copper price per metric ton and subset from 1980 forward,
copper = Quandl("WORLDBANK/WLD_COPPER", type = "ts")
plot(copper)


cop = window(copper, start = c(1980, 1), end = c(2014, 3))
plot(cop)


We can see that something dramatic happened around 2004/5 to drive prices significantly higher, prior to a precipitous collapse and eventual recovery. I will cut this up into a number of subsets, but for now I will play around with changepoints.

require(changepoint)

##
## Attaching package: 'zoo'
##
## The following objects are masked from 'package:base':
##
##     as.Date, as.Date.numeric
##
## Successfully loaded changepoint package version 1.1
## Created on 2013-06-19
##  Substantial changes to the structure of the package have occured between version 0.8 and 1.0.2.  Please see the package NEWS for details.

mean.cop = cpt.mean(cop, method = "BinSeg")

## Warning: The number of changepoints identified is Q, it is advised to
## increase Q to make sure changepoints have not been missed.

cpts(mean.cop)

## [1] 289 312 345 355 367

plot(mean.cop)


Here are a couple of points I think are noteworthy. The first changepoint at observation 289 equates to January 2004. The collapse of price, exemplified by observation 345, is September of '08. Perhaps a reason to observe elevated price levels in copper that last decade or so, is the fact of China's economic growth and that country's use of copper as collateral. Take this quote for example, “A Reuters report this week that Chinese companies have used something between 60 and 80 percent of the country's copper imports as collateral to finance projects has unsettled markets, triggered a fresh wave of selling.” “This is an amazing estimate; it would change the perception of 'Dr. Copper' as a gauge of the Chinese economy, as it's not being used for industrial production, but rather as a financing tool for whatever reason,” said Chris Weston, chief market strategist at IG.“ (http://www.cnbc.com/id/101486303)

Financing tool!? If that is the case then Dr. Copper is currently no better than a strip mall psychic at market prognostication. But, what about the past? In a prior financial epoch was the 'Red Metal' the 'Jimmy the Greek' of US equitites? A side-by-side comparison with US stocks is in order. Again, we can quickly get our data from quandl.com…

sandp500 = Quandl("YAHOO/INDEX_SPY", type = "ts", collapse = "monthly", column = 6)
SandP = log(window(sandp500, start = c(1995, 1), end = c(2014, 3)))
plot(SandP)


# match copper time series to S&P 500
cop2 = log(window(cop, start = c(1995, 1), end = c(2014, 3)))


Now, let's compare the charts together.

compare.ts = cbind(SandP, cop2)
plot(compare.ts)


Taking a look at this from '95 to '05, can anyone see anything here that would lead us to conclude that copper is a leading indicator? Maybe from 2003 until 2011 there is something. So, let's cut this down into that subset.

sp.sub = window(SandP, start = c(2003, 1), end = c(2011, 12))
cop.sub = window(cop2, start = c(2003, 1), end = c(2011, 12))
compare2.ts = cbind(sp.sub, cop.sub)
plot(compare2.ts)


Perhaps we are on to something here, a period of time where copper truly was a leading indicator.

In previous posts, I looked at vector autoregression and granger causality with differenced data to achieve stationarity. However, here I will apply VAR using "levels”, with the techniques spelled out in this two blog posts:

We first determine the maximum amount of integration.

require(forecast)

## This is forecast 5.0

ndiffs(sp.sub, alpha = 0.05, test = c("kpss"))

## [1] 1

ndiffs(cop.sub, alpha = 0.05, test = c("kpss"))

## [1] 1


Second, run a VAR model selection to determine the correct number of lags. This is done on levels not the difference.

require(vars)

VARselect(compare2.ts, lag = 24, type = "both")

## $selection ## AIC(n) HQ(n) SC(n) FPE(n) ## 2 2 2 2 ## ##$criteria
##                 1          2          3          4          5          6
## AIC(n) -1.096e+01 -1.126e+01 -1.122e+01 -1.123e+01 -1.119e+01 -1.111e+01
## HQ(n)  -1.086e+01 -1.112e+01 -1.104e+01 -1.100e+01 -1.091e+01 -1.079e+01
## SC(n)  -1.073e+01 -1.092e+01 -1.076e+01 -1.066e+01 -1.049e+01 -1.030e+01
## FPE(n)  1.743e-05  1.285e-05  1.338e-05  1.324e-05  1.390e-05  1.502e-05
##                 7          8          9         10         11         12
## AIC(n) -1.106e+01 -1.101e+01 -1.100e+01 -1.096e+01 -1.089e+01 -1.087e+01
## HQ(n)  -1.069e+01 -1.060e+01 -1.053e+01 -1.045e+01 -1.033e+01 -1.027e+01
## SC(n)  -1.014e+01 -9.972e+00 -9.842e+00 -9.692e+00 -9.500e+00 -9.369e+00
## FPE(n)  1.582e-05  1.669e-05  1.703e-05  1.774e-05  1.928e-05  1.977e-05
##                13         14         15         16         17         18
## AIC(n) -1.084e+01 -1.080e+01 -1.071e+01 -1.075e+01 -1.071e+01 -1.071e+01
## HQ(n)  -1.019e+01 -1.010e+01 -9.968e+00 -9.956e+00 -9.871e+00 -9.821e+00
## SC(n)  -9.218e+00 -9.061e+00 -8.861e+00 -8.779e+00 -8.625e+00 -8.506e+00
## FPE(n)  2.070e-05  2.184e-05  2.414e-05  2.373e-05  2.516e-05  2.582e-05
##                19         20         21         22         23         24
## AIC(n) -1.068e+01 -1.073e+01 -1.071e+01 -1.070e+01 -1.063e+01 -1.058e+01
## HQ(n)  -9.746e+00 -9.756e+00 -9.685e+00 -9.631e+00 -9.516e+00 -9.418e+00
## SC(n)  -8.362e+00 -8.303e+00 -8.162e+00 -8.039e+00 -7.854e+00 -7.687e+00
## FPE(n)  2.728e-05  2.656e-05  2.817e-05  2.947e-05  3.298e-05  3.647e-05

# lag2 is optimal for VAR


This is where it gets tricky. To get the linear models I will need to create the lagged variables. The max number of lags is equal to the VAR selection plus the maximum differences, so the magic number here is 3.

sp = zoo(sp.sub)
lag.sp = lag(sp, -(0:3), na.pad = T)
copper = zoo(cop.sub)
lag.copper = lag(copper, -(0:3), na.pad = T)


Here, the two linear models are built. Note that the index is used to account for the trend. You also need to sequence the variables the same in each lm, so that the Wald test makes sense.

lm1 = lm(sp ~ lag.sp[, 2:4] + lag.copper[, 2:4] + index(sp))
lm2 = lm(copper ~ lag.sp[, 2:4] + lag.copper[, 2:4] + index(sp))


With the linear models built it is time to concoct the Wald Tests and examine the chi-squared p-values.
Again, this is tricky, but follow along… We are testing the significance of the lagged coefficients of copper on S&P and vice versa. However, this method does not include the one lag greater than the optimal VAR lag (in this case 2). This model was built with 3 lags!

require(aod)

## Warning: package 'aod' was built under R version 3.0.3

wald.test(b = coef(lm1), Sigma = vcov(lm1), Terms = c(5:6), df = 2)

## Wald test:
## ----------
##
## Chi-squared test:
## X2 = 9.7, df = 2, P(> X2) = 0.0079
##
## F test:
## W = 4.8, df1 = 2, df2 = 2, P(> W) = 0.17

wald.test(b = coef(lm2), Sigma = vcov(lm2), Terms = c(2:3), df = 2)

## Wald test:
## ----------
##
## Chi-squared test:
## X2 = 13.2, df = 2, P(> X2) = 0.0014
##
## F test:
## W = 6.6, df1 = 2, df2 = 2, P(> W) = 0.13


Given the significant p-values for both tests, what can we conclude? For Granger-Causality when you have significant tests 'both ways', it is believed you are missing some important variable and casuality is rejected. If we accept that as the case here, then the prognostic value of copper even during this limited time frame is 'Busted'. However, before we throw the proverbial baby out with the bath water it is important to discuss an important limitation of this work. This analysis was done with monthly data and as a result may not be sensitive enough to capture a signal of causality. All things considered, I have to say that the Dr. Copper Myth is not totally Busted for the given timeframe we examined. It is certainly not Confirmed, so at best it is Plausible.