Spurious correlations and random walks

June 29, 2019
By

(This article was first published on Fabian Dablander, and kindly contributed to R-bloggers)

The number of storks and the number of human babies delivered are positively correlated (Matthews, 2000). This is a classic example of a spurious correlation which has a causal explanation: a third variable, say economic development, is likely to cause both an increase in storks and an increase in the number of human babies, hence the correlation.1 In this blog post, I discuss a more subtle case of spurious correlation, one that is not of causal but of statistical nature: completely independent processes can be correlated substantially.

AR(1) processes and random walks

Moods, stockmarkets, the weather: everything changes, everything is in flux. The simplest model to describe change is an auto-regressive (AR) process of order one. Let $Y_t$ be a random variable where $t = [1, \ldots T]$ indexes discrete time. We write an AR(1) process as:

where $\phi$ gives the correlation with the previous observation, and where $\epsilon_t \sim \mathcal{N}(0, \sigma^2)$. For $\phi = 1$ the process is called a random walk. We can simulate from these using the following code:

1. There are, of course, many more

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...