Data Exploration – Gold vs Gold Mining Stocks

[This article was first published on Adventures in Statistical Computing, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I have been looking into time series analysis with R.  I’m still ramping up the learning curve as I am very accustomed to SAS/ETS.  With ETS, everything is in a couple of procedures, I know where and how to get things done.  In R, things are spread out.  That seems to be a the product of an open system.  That said, I’m getting there and I’m becoming more proficient.

Yesterday I started looking at the spread between gold and gold mining stocks.  Lots of pundits are exclaiming now is the time to buy gold miners as they are grossly undervalued relative to the spot price of gold.  Others give more supply and demand explanations as to why miners are cheap.  But everyone agrees they are cheap.  Are they?  And is there a tradable structure to the miners and gold?

To start, I pulled the brief history of the GDX ETF — the ETF that tracks the large, non-junior, gold miners.  I also pulled the same history for the GLD ETF which more or less tracks the price of gold.  I took the log of their historical closing prices and plotted it:


to   = “2012-01-14”
from = “2006-06-01”

gld = importSeries(“gld”,to=to,from=from)
gdx  = importSeries(“gdx”,to=to, from=from)

series = merge(gld,gdx)[,c(“gld.Open”,“gld.Close”,“gld.Return”,
                           “gdx.Open”,“gdx.Close”,“gdx.Return”)]

#Create the log of the close and merge back onto the series
x = log(series[,c(“gld.Close”,“gdx.Close”)])
colnames(x) = c(“gldLogClose”,“gdxLogClose”)
series = merge(series,x)

chart.TimeSeries(series[,c(“gldLogClose”,“gdxLogClose”)],
              main=“Log Price Closes”,legend.loc=“topleft”)




So there is definitely a correlation in the price.  We can also see that sometime in late 2010, the miners turn flat and gold continues it rise.  So now let’s plot the actual spread.

x = series[,“gldLogClose”] series[,“gdxLogClose”]
colnames(x) = c(“diffLog”)

chart.TimeSeries(x,main=“GLD-GDX”)



This shows us that spread has actually been increasing from the “get-go,” or maybe late 2007 / early 2008.  It looks like a nice linear trend if you take out the spike caused by the 2008 crash.  Let’s see what we can make of it.  First drop the data prior to what looks like the regime change in 2008.  Then regress that on time.  Next drop the crash spike and rerun the regression.

x = window(x,start=“2008-01-01”,end=to)

reg = lm(x[,c(“diffLog”)] ~ time(x)@Data)

plot(x,main=“GLD-GDX”)
abline(reg)
x1 = window(x,start=“2008-01-01”,end=“2008-08-01”)
x2 = window(x,start=“2009-06-01”,end=to)
x3 = rbind(x1,x2)
lines(x3,col=“red”)

reg2 = lm(x3~time(x3)@Data)

abline(reg2)

So we have a decent looking trend line, but what about the residuals.

resid = residuals(reg2)
plot(resid,type=“l”)
abline(h=0)

There is still structure there.  In fact, to my eye it looks like there is a unit root involved and this process might just be randomly walking around the 0 mean we forced on it via the regression.  If that is true, then we have a case where the spread is itself a random walk with a drift.  Let’s run a few tests.

adf.test(resid)
Augmented Dickey-Fuller Test

data:  resid
Dickey-Fuller = -3.011, Lag order = 9, p-value = 0.1503
alternative hypothesis: stationary


kpss.test(resid,null=“Level”,lshort=FALSE)

KPSS Test for Level Stationarity

data:  resid
KPSS Level = 0.4078, Truncation lag parameter = 20, p-value = 0.07377


In neither case, can we reject the null hypothesis that the series is non-stationary.

What I conclude from this is that while you can make a very good economic case for why GDX should catch up to GLD, the data do not support it.  At least not in the limited sample I have.  I don’t see a long term data based tradable structure to the data.  There might be something in the short term, but I haven’t looked that deeply.

The economist in me says that at some point the GDX companies will benefit from the higher price of gold and their equity values should increase.  

The contrarian in me says that while this spread could narrow and revert back to some long term mean, that doesn’t mean GDX skyrockets.  It could mean that GLD falls; a lot. Maybe that is what the market is telling us?  Maybe it is discounting GLD’s rise as unsustainable and the long term profitability of GDX is in question?  

Or maybe the market is saying that it takes lots of resources to get gold out of the ground and gold has risen along with other resources.  Maybe the market discounts that possibility and believes the GDX’s margins will be compressed? Does it think that profitability is not a function of the price of gold, but a function of the spread between gold and some other basket of resources?





To leave a comment for the author, please follow the link and comment on their blog: Adventures in Statistical Computing.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)