Investigating the relationship between gold and bitcoin prices with R.

[This article was first published on quandl blog » R quandl blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Reine by Ennio Pozzetti

Image by Ennio Pozzetti

In this post I will explore some of the movements in markets in recent years, these movements have caught many by surprise resulting in some people unexpectedly striking it rich while others have lost a great deal. I am no financial advisor, nor do I have a background in financial analysis, so please take everything with a grain of salt. If anything is true about financial markets, they are inherently unpredictable.

I will investigate the relationship between the price of gold and the price of bitstamps, with two competing hypothesizes. One hypothesis is that both goods represent investments that people seek because they are “safe” and “risk minimal”. The mantra “gold always has value” and “bit money does not rely upon government support” both seem to imply this. If this is true then both markets will move together. When the total economy seems uncertain, both will gain in price. When the economy does well, both will lose value as investors shift from safe investments to investments that provide higher expected return. An alternative hypothesis investigated here is that they are seen as competing investments. Thus when the price of gold goes down investors will move into bit currency which will drive the price of bit currency up. Likewise, if the price of bit coins goes down investors will shift to gold which will drive the price of gold up.

To assist me in my exploration I will use the R packages Quandl and ggplot2. Quandl gathers, organizes, and supplies free data from upwards of 9 million data sets (currently). ggplot2 can be installed and run through normal means (install.packages(c(“ggplot2″,”reshape”)) though in order to use Quandl you will need to install it (install.packages(“Quandl”)) as well as register and validate a free account. You will be prompted as necessary when you attempt to run the following Quandl function.

Data used in this article:


# Let's start by taking a look at the price of gold since 2010
pGold <- Quandl("BUNDESBANK/BBK01_WT5511", start_date="2010-01-01")

# Plot Data

p1 <- ggplot(pGold, aes(x=Date,y=Value))+
  ggtitle("Price of Gold")

# Find Bitstamp Permalink Data at
bitstamp <- Quandl("BITCOIN/BITSTAMPUSD")

# bitstamp data has four different prices.
# high - highest price of day
# low - lowest price of day
# close - last price of day
# Weighted.Price- I believe this is calculated as:
#   sum(price*volume at price)/total volume
# I will be using the weighted price

# Drop infinitely large values (bitstamps worth > 10^6)
# And reduce the data to just 
bitstamp <- bitstamp[bitstamp$Weighted.Price<10^6,c("Date", "Weighted.Price")]

names(bitstamp2) <- c("Date", "Price")

p2 <- ggplot(bitstamp2, aes(x=Date,y=Price)) +
  ggtitle("Price of BitStamp")

# In order to compare our two time series let's combine their data
names(pGold) <- c("Date", "Price")

PriceData <- rbind(cbind(pGold, Good="Gold"), cbind(bitstamp2, Good="BitStamp"))

ggplot(PriceData, aes(x=Date,y=Price, colour=Good)) +
  ggtitle("Price Currency Options (USD)")`

Price Currency Options (USD)

We see immediately that there are many difficulties with comparing the two series. A major one is that the price of Bitstamps has started very low and rose very high suddenly. Let’s try to transform our data.

PriceData$P1d <- c(NA, PriceData$Price[-1]-PriceData$Price[-nrow(PriceData)])

PriceData$P1d[PriceData$Good[-1]!=PriceData$Good[-nrow(PriceData)]] <- NA

ggplot(subset(PriceData1d, Date>"2013-04-01"&Date<"2014-01-01"), 
  aes(x=Date,y=PriceDiff, colour=Good)) +
  ggtitle("Price Currency Options (USD)")

Price Daily Change (USD)

To convert our data to wide I will follow the answer to my question posed on Stackoverflow.


PriceData.wide <- 
  reshape(PriceData, direction="wide", idvar = "Date", timevar = "Good")

cor(PriceData.wide[,c(3,5)], use="pairwise.complete.obs")

# From the correlation matrix we see that there is movement between the
# price of gold and that of bitstamps (-4.3% correlation), supporting
# our hypothesis that gold is seen as safe and bitcoin risky.

# However, if we were unwise enough to try to make investment decisions
# from this information we might want to know something more such as
# does the price from the previous day of change in gold predict bitcoin
# prices today or visa versa?

  lm(P1d.BitStamp[-1]~P1d.Gold[-nrow(PriceData.wide)]-1, data=PriceData.wide))

  lm(formula = P1d.BitStamp[-1] ~ P1d.Gold[-nrow(PriceData.wide)] - 
       1, data = PriceData.wide)

# Coefficients:
# Estimate Std. Error t valuePr(>|t|)  
# P1d.Gold 0.054780.03247 1.687 0.092 

# Multiple R-squared:0.00326, Adjusted R-squared:0.002115


# Coefficients:
# Estimate Std.Errort value  Pr(>|t|)
#P1d.BitStamp -0.005051   0.035219 -0.143 0.886

# Multiple R-squared:2.359e-05, Adjusted R-squared:-0.001123

We can see therefore that the previous day’s price in the bitstamp market has no significant predictive power in todays price of gold while the previous days price of gold does seem to have some predictive power in the price of bitstamps.

That said, the predictive ability of gold is extremely tenuous and seems to explain less than .3% of the variation in the price of bitcoin. In addition, the statistical significance of the coefficient is only at a p-value of 9.2% meaning this result will occur in 1 out of 11 case by pure random chance. And this is ignoring the likely serial (time dependent) nature of unobserved variation which often has the effect of shrinking standard errors.

Therefore relying upon this very weak explanatory model to make any kind of significant investment decisions is unlikely to result in a significant return on investment.

I would like to thank the generous assistance of the community on stackoverflow who patiently answered the many questions that came up in the processes of writing this post.

  1. Reshape data from Long to Wide.
  2. Select Data After Specific Date.

To leave a comment for the author, please follow the link and comment on their blog: quandl blog » R quandl blog. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)