Dependence and Correlation

[This article was first published on mickeymousemodels, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In everyday life I hear the word “correlation” thrown around far more often than “dependence.” What’s the difference? Correlation, in its most common form, is a measure of linear dependence; the catch is that not all dependencies are linear. The set of correlated random variables lies entirely within of the larger set of dependent random variables; correlation implies dependence, but not the other way around. Here are some silly (but hopefully interesting) examples to illustrate that point:

n <- 5000
df <- data.frame(x=rnorm(n), y=rnorm(n, mean=5, sd=2))
plot(df, xlim=c(-6, 6), ylim=c(-2, 12), main="A Beehive")
mtext("X and Y are independent (and therefore uncorrelated)")
savePlot("beehive.png")


n <- 2500
df <- data.frame(x=rexp(n), y=rexp(n, rate=2))
plot(df, xlim=c(-0.05, 10), ylim=c(-0.05, 5), main="A B-2 Bomber")
mtext("X and Y are independent (and therefore uncorrelated)")
savePlot("bomber.png")


n <- 5000
df <- data.frame(x=runif(10000))
df$y <- runif(10000, -abs(0.5 - df$x), abs(0.5 - df$x))
plot(df, xlim=c(-0.05, 1.05), ylim=c(-0.55, 0.55), main="A Bowtie / Butterfly")
mtext("X and Y are dependent but uncorrelated")
savePlot("bowtie.png")


n <- 20000
df <- data.frame(x=runif(n, -1, 1), y=runif(n, -1, 1))
df <- subset(df, (x^2 + y^2 <= 1 & x^2 + y^2 >= 0.5) | x^2 + y^2 <= 0.25)
plot(df, main="Saturn")
mtext("X and Y are dependent but uncorrelated")
savePlot("saturn.png")


n <- 5000
df <- data.frame(x=rnorm(n))
df$y <- with(df, x * (2 * as.integer(abs(x) > 1.54) - 1))
plot(df, xlim=c(-4, 4), ylim=c(-4, 4), main="A Swing Bridge")
mtext("X and Y are dependent but uncorrelated")
savePlot("bridge.png")


n <- 1000
df <- data.frame(x=rnorm(n), z=sample(c(-1, 1), size=n, replace=TRUE))
df$y <- with(df, z * x)
df <- df[ , c("x", "y")]
plot(df, xlim=c(-4, 4), ylim=c(-4, 4), main="A Treasure Map")
mtext("X and Y are dependent but uncorrelated")
savePlot("treasure.png")


The last two are classic examples: X and Y are normally distributed, but (X, Y) is not a bivariate normal.

I'll admit that the two exponentials are a bit counterintuitive to me, at least visually. (They're in the second plot from the top, which looks vaguely like a B-2.) The variables are independent; if you regressed Y on X you'd end up with a flat line. Yet, somehow, if I were to look at that plot without knowing how the variables were generated, I'd want to draw a diagonal line pointing up and to the right. If anything, it goes to show that I should probably not run regressions "by inspection."

To leave a comment for the author, please follow the link and comment on their blog: mickeymousemodels.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)