Drivers of Gold Returns in R

[This article was first published on R Codes – Light Finance, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

As the ultimate store of value, gold has appreciated substantially over the past 18 months; rallying from ~$1,300 in March 2019 to $1,954 on September 18th; an increase of 33.4%. In fact, gold as an asset class has performed very well over the past 20 years and has markedly outperformed stocks; though this has been accompanied by long periods of underperformance.

The recent swing in gold price began with a decline in real interest rates but accelerated with COVID-19 and rapidly deteriorating economic conditions. For now, the worst fears for the economy seem to have been avoided, but it has required unprecedented monetary and fiscal support. As central banks around the world rush to expand their balance sheets to backstop asset values and consumer prices, a lot of risk could end up being socialized.

The Federal Reserve’s balance sheet has been relatively stable over the last 10 years at 25% of nominal GDP, but in recent months has swelled to 40%. Given the expectation for a protracted recovery and multi-trillion-dollar deficit spending in the years to come this percentage is expected to grow as the Fed will increasingly need to monetize the federal debt in order to keep nominal rates low. This will likely come in conjunction with further quantitative easing measures designed to further stimulate the economy.

In short, the era of financial repression is back at an extraordinary scale and this has left many investors wondering what this seems for the future of inflation, the US dollar, and gold.

In this post, I aim to investigate the main drivers for the price of gold. Specifically, I will examine the role of gold as a financial asset. The focus will be on which financial variables principally explain gold price and gold returns.

The analysis will be conducting in R using the extensive library of packages available therein including: PerformanceAnalytics and quantmod. All of the data can be obtained freely from Yahoo! Finance and the Federal Reserve Bank of St. Louis FRED Database.

The Task and Set Up

For this case study, we will be investigating the use of gold as a financial asset. In financial parlance, gold is postulated to serve four main purposes in a portfolio:

  1. Store of Value: Since the supply of gold is fixed, the supply of fiat currency is (more or less) arbitrary, and inflation is generally positive, gold is thought to be more effective at preserving wealth than its fiat counterparts.
  2. Hedge Against USD Weakness: The US Dollar is the world’s de facto reserve currency and carries with it the implicit guarantee of being “worth” something. Much of international trade (specifically, oil) is conducted in Dollars, Dollar based financing in important for emerging economies and corporate multinationals, and, obviously, Dollars are required to access US capital and goods markets. As such, exchange rates represent a complex dynamic between the demand for and supply of USD. When global economic conditions are good or interest rates in the US are low, the dollar is thought to weaken against other major world currencies.
  3. Hedge Against Deteriorating Economic Conditions: When global economic conditions are weak like during 2008 or more recently with COVID-19, gold is viewed as the ultimate safe haven asset.
  4. Hedge Against Market Volatility: Using gold as hedge against stock market volatility is arguably one of the more speculative uses of gold. Nevertheless, the tactical use of gold to hedge market risk is a possible use case.

The goal of this study will be to assess the impact on gold price with respect to each of the proposed use cases and to demonstrate if/when the effects are observed most strongly.

There are a couple of different ways to approach the analysis and different methods upon which to draw. In order to gain comprehensive insight into the nature of gold prices I will employ four techniques in tandem. Specifically,

  1. Linear Regression
  2. Attribution Analysis
  3. Cluster Analysis
  4. Data Visualization

The hope is that by using these methods in conjunction we will reveal a deeper and comprehensive model for pricing gold.

Data Acquisition, Clean Up and Processing

I will be using financial and economic variables aimed at measuring different sources of risk and return for gold. We will be using data spanning January 29, 2003 through September 2, 2020. As mentioned, all of the data can be accessed freely from Yahoo! Finance and FRED.

We’ll begin with the FRED data. Next to each variable I have placed the unique identifier that you can query from the database.

 FRED Data:

  • Gold Price: Gold fixing price, London Bullion Market. (GOLDAMGBD228NLBM)
  • Real Interest Rate: Yield on 10-Year TIPS. (DFII10)
  • Inflation Expectations: Difference between the nominal yield on 10-Year Treasury and 10-Year TIPS. (T10YIE)
  • USD-Euro Exchange Rate. (DEXUSEU)

The model will be based on bi-weekly data. However, FRED retrieves data at the highest available frequency so daily data always comes in as daily. Furthermore, the data is retrieved from the beginning of the series, so you end up getting a lot of NAs. As such, we will need to do a little clean up before we proceed.

The following segments of R code show loading the identifiers into variables and separate queries to FRED. The data is re-indexed and converted to biweekly. I’ve tried to comment the code as much as possible so you can see what’s happening.

R Code: FRED Data

If use case #4 is true, then we would expect the price of gold respond to changes in the general capital market indices and volatility. We will obtain data for these two variables from the venerable Yahoo! Finance:

  • Market: S&P 500 Index (^GSPC)
  • Volatility: VIX (^VIX)

R Code: Yahoo! Finance Data


Below is the correlation matrix for gold and our proposed set of variables:

We observe that the correlations between gold and the variables of interest are not particularly strong with the real interest rate and USD-Euro exchange rate being the most prominent.

The below graph shows the price of gold plotted against the negative 10-Year TIPS yield (recall that we are using the yield on TIPS as our proxy for the real interest rate). I have elected to display the yield on TIPS as negative. Since the correlation is negative the series’ move opposite one another and displaying the yield as negative helps to demonstrate the trend.

We observe that in the early part of the series the correlation was more muted but has been very tight over the past 10 years and particularly recently.

We can see the relationship between gold price and the USD/Euro exchange rate in the graph below. From 2003-2010 gold and the Euro appreciated concurrently, generally trended down for most of the 2010’s but have begun to rise again recently. The USD/Euro has lagged the upswing in gold, but as of this writing it at the highest level seen in over 2 years.

The correlation between the two is a bit more evident in the earlier part of the series then after 2010, but this was probably to be expected. Europe has been dealing with rolling crises for the past 10 years including the 2011 sovereign debt crisis and BREXIT in 2016. The EU has been plagued by sluggish economic growth so it’s unsurprising that the Euro and gold have experienced periods of decoupling.

I’ve run a simple regression on the gold price and explanatory variables to evaluate the separate impacts and provide clues of where to dive deeper. The proposed model is as follows:


  • rGold,t= Price return of gold at time ‘t’
  •  Chg.Real.Ratet= Level change in the 10-year real interest rate
  •  Chg.Inflationt= Level change in 10-year inflation expectations
  •  rEuro-USD,t= Return of the Euro-USD exchange rate
  •  rS&P,t= Price return of the S&P 500
  •  Chg.VIXt= Level change in the VIX

The strictly return variables (Euro-USD and S&P) produce elasticities. Elasticity is the measurement of the percentage change of an economic variable in response to a percentage change in another. Mathematically, elasticity is defined as follows:

By contrast, the variables that represent level changes produce semi-elasticities. A semi-elasticity measures the percentage change of an economic variables in response to the unit change in another. Semi-elasticity is calculated as follows:

Since the real rate, inflation expectations, and VIX are already defined as percentages, their respective coefficients are interpreted as: the percentage change in the price of gold (i.e. the return) given a 1-percentage point change in the explanatory variable. If you are familiar with bond trading, this is the same interpretation as that of modified duration where the modified duration gives the expected percentage change in the price of a bond given a 1-percentage point change in the yield.

R Code: Regression and Attribution Analysis

 The output from the regression is as follows:

Immediately we see that the level change in the real interest rate and percentage change (i.e. return) of the USD-Euro exchange rate are highly statistically significant while the level change in inflation expectations, return of the S&P 500 and level change of the VIX are decidedly not. The coefficient for the change in the real interest rate of -.045 can be interpreted as given a 1-percentage point increase/decrease in the real rate, we expect the price of gold to decrease/increase by 4.5%; this confirms what we had expected a priori. The coefficient for the Euro is interpreted as given a 1% increase/decrease in the USD-Euro exchange rate, we expect the price of gold to increase/decrease by .71% (71 bps).

The Adjusted-R2 of .23 indicates that considerable variability remains in the model. Considering that only 2 of the proposed 5 variables are statistically meaningful this still provides us with quite a bit of useful information.

Having focused our attention on the real interest rate and the Euro-Dollar exchange rate as our variables of interest, we can dig a little deeper to assess the relative importance of these two variables in the model. To do this I use the relaimpo package available in R. relaimpo provides a suite of functions for decomposing the variance of a regression into the relative contributions made by the model’s regressors. I’ll be using average sequential sum of squares over orderings method to perform the decomposition. This particular technique recursively reorders the variables in model and records the initial R2, how R2 changes when a variable is added and computes an average.

For convenience, we’ll drop the insignificant variables from our model so there will only be two orderings to consider:

  1. Real Interest Rate >>> Euro-Dollar
  2. Euro-Dollar >>> Real Interest Rate

Results from the sequential sum of squares decomposition are below:

The R2 obtained from the decomposition is ~23% which matches with the model results from earlier. Of this 23%, 7% of the variance is explained by the change in real interest rates and 16% is explained by the change in the exchange rate. Put another way, 30% of the explained variance is attributable to interest rates and the remaining 70% is attributable to the exchange rate. Thus the USD-Euro exchange rate is the dominate variable for explaining the price of gold.

If we refer to the beginning of this article, we proposed that two of gold’s functions were to serve as 1) a hedge against changes in the Dollar (specifically, USD weakness), and 2) a hedge against broadly deteriorating economic conditions. Our results seem to confirm these views. A decline in the real interest rate is typically associated with declining economic conditions, under this scenario we would expect gold to appreciate. Similarly, Dollar weakness tends to be associated with improving economic conditions in the rest of the global economy, under such conditions we would expect gold to appreciate.

To visualize these impacts, I’ve plotted the return of gold against the change in the real yield and return of the USD/Euro exchange rate, respectively, along with confidence and prediction intervals. For the graph of gold v. USD/Euro we see a pretty consistent upward trend and a tight fit. Movements in the exchange rate in either direction are associated with a commensurate response from gold. For gold v. real yield, we observe a downward sloping line as we would expect, but the slope appears to be accentuated by extreme movements in the yield.  

R Code: Plots with Confidence and Prediction Intervals

k-Means Cluster Analysis

Thus far we have examined the effect on gold v. changes in the real interest rate and exchange rate in isolation. It’s important to note that the relationship between interest and exchange rates is complex. Theory suggests that as real interest rates decline/rise, US based assets become relatively less/more attractive which would drive a decline/rise in Dollar denominated exchange rates. However, during times of significant global economic stress (like ’08-’09 and more recently with COVID), US based assets (in particular, Treasuries) are view as “safe haven” assets for investors which (counterintuitively) drives down rates and drives up the Dollar. Furthermore, the real interest rate is a function of nominal rates and inflation expectations. If inflation expectations fall quicker than nominal exchange rates, we can see the real rate actually go up even as the economy crashes; this is equivalent to tightening monetary policy in a recession, the opposite of what you want to do.

To try and disentangle these confounding effects, I’ve employed cluster analysis. Cluster Analysis is a broad set of techniques for finding subgroups of observations within a dataset. The specific method that I have chosen to use is k-means clustering. The basic idea behind k-means clustering consists of defining clusters so that the total intra-cluster variation (known as total within-cluster variation) is minimized. There are several k-means algorithms available. The specific implementation of the k-means method that I will use defines the total within-cluster variation as the sum of squared distances Euclidean distances between observations and the corresponding centroid:


  • xi = is a data point belonging to the cluster
  • uk = is the mean value of the points assigned to the cluster

We define the total within-cluster variation as follows:

The total within-cluster sum of square measures the compactness (i.e goodness) of the cluster and we want it to be as small as possible.

Choosing the number of centroids (i.e. means) is slightly tricky and requires a bit of data mining, however we can employ some diagnostic tools to help. The below code shows my process. Note that all of the functions required to run a k-means clustering are available in base R.

It’s important to make clear that we do not know the “true” number of centroids ahead of time. I use three in order to initialize the model but propose that there could be as many has five centroids. For iterations 1 through 5, I record the ratio of the Between Sum of Squares (BSS) to the Total Sum of Squares (TSS). This ratio tells us what proportion of the total variance is explained by the number of clusters. As you increase the number of clusters this ratio will begin to approach 1; it will equal 1 if the number of clusters, C, equals the number of observations. The goal is to strike a balance between variance explained and the number of clusters as this gives us more general results.

To do this effectively, I employ a silhouette plot which plots the number of clusters v. the BSS/TSS ratio. In general, you want to pick the number of clusters at the point where you begin to see the plot level off. This implies that substantial variance is explained by a small number of clusters and that adding more clusters does not contribute meaningfully to the explanatory power of the model. Taking a look at the graph below we see that for clusters 1-3 the BSS/TSS ratio rises precipitously, the gains begin to decelerate at cluster 4 and levels off at cluster 5.

R Code: k-Means Cluster Analysis

The below table shows the BSS/TSS Ratio for the different cluster sizes and the increase as we move to higher cluster sizes:

The cluster analysis suggests that the majority of the data is explained by 4 clusters; more than that and the gains are small. Admittedly, it is a bit difficult to tell if 3 or 4 clusters is more appropriate, but to me 4 is more comfortable.

The size (i.e. number of observation) and centroid for each cluster are as follows:

The results fit quite nicely and align with what we would expect. For clusters 1 & 2, a decline in the price of gold is commensurate with Dollar appreciation and a rise in the real yield. For clusters 3 & 4, an increase in the price of gold is coupled with Dollar depreciation and a decline in the real rate. Clusters 2 & 3 are the most interesting (to me) as they contain the fewest observations and represent the extreme movements in the explanatory variables and the price of gold.

With these results in hand we can plot the data, color coded by cluster in order to visualize the interaction. To do this we will leverage the package scatterplot3D. In the following graphs the cluster colors are as follows:

  • Cluster 1 = Black
  • Cluster 2 = Green
  • Cluster 3 = Blue
  • Cluster 4 = Light Pink

I present two different angels for the plot. The first plots the exchange rate on the x-axis and the second plots the real yield on the x-axis. I have also overlaid the respective graphs with a regression plane so we can see the trends in the data.

R Code: 3D Scatter Plots


In this post we have taken a deep dive into gold and the economic variables which impact it most. We used regression analysis and dominance attribution to determine that gold is principally impacted by the relative strength of the Dollar (as proxied by the USD/Euro exchange rate) and the real rate of interest. Furthermore, we found that gold is relatively unimpacted by other financial and economic variables such as the S&P 500, volatility, or inflation.

Secondly, we used k-means cluster analysis to examine the joint relationship between rates, the Dollar and gold. Based on the results from the cluster analysis we determined that, indeed, a decline/rise in real rates in associated with weakness/strengthening of the US Dollar which impact the price of gold significantly, particularly when the moves are extreme.

Along the way I demonstrated the R code and packages that I used so that you may apply them for your own projects.

Hopefully, this post has thought you a little (alot!) more about gold and its use in the context of your portfolio.

Until next time, thanks for reading!

-Aric Lux.

The post Drivers of Gold Returns in R appeared first on Light Finance.

To leave a comment for the author, please follow the link and comment on their blog: R Codes – Light Finance. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)