# Dependence and Correlation

June 13, 2011
By

(This article was first published on mickeymousemodels, and kindly contributed to R-bloggers)

In everyday life I hear the word “correlation” thrown around far more often than “dependence.” What’s the difference? Correlation, in its most common form, is a measure of linear dependence; the catch is that not all dependencies are linear. The set of correlated random variables lies entirely within of the larger set of dependent random variables; correlation implies dependence, but not the other way around. Here are some silly (but hopefully interesting) examples to illustrate that point:

`n <- 5000df <- data.frame(x=rnorm(n), y=rnorm(n, mean=5, sd=2))plot(df, xlim=c(-6, 6), ylim=c(-2, 12), main="A Beehive")mtext("X and Y are independent (and therefore uncorrelated)")savePlot("beehive.png")`

`n <- 2500df <- data.frame(x=rexp(n), y=rexp(n, rate=2))plot(df, xlim=c(-0.05, 10), ylim=c(-0.05, 5), main="A B-2 Bomber")mtext("X and Y are independent (and therefore uncorrelated)")savePlot("bomber.png")`

`n <- 5000df <- data.frame(x=runif(10000))df\$y <- runif(10000, -abs(0.5 - df\$x), abs(0.5 - df\$x))plot(df, xlim=c(-0.05, 1.05), ylim=c(-0.55, 0.55), main="A Bowtie / Butterfly")mtext("X and Y are dependent but uncorrelated")savePlot("bowtie.png")`

`n <- 20000df <- data.frame(x=runif(n, -1, 1), y=runif(n, -1, 1))df <- subset(df, (x^2 + y^2 <= 1 & x^2 + y^2 >= 0.5) | x^2 + y^2 <= 0.25)plot(df, main="Saturn")mtext("X and Y are dependent but uncorrelated")savePlot("saturn.png")`

`n <- 5000df <- data.frame(x=rnorm(n))df\$y <- with(df, x * (2 * as.integer(abs(x) > 1.54) - 1))plot(df, xlim=c(-4, 4), ylim=c(-4, 4), main="A Swing Bridge")mtext("X and Y are dependent but uncorrelated")savePlot("bridge.png")`

`n <- 1000df <- data.frame(x=rnorm(n), z=sample(c(-1, 1), size=n, replace=TRUE))df\$y <- with(df, z * x)df <- df[ , c("x", "y")]plot(df, xlim=c(-4, 4), ylim=c(-4, 4), main="A Treasure Map")mtext("X and Y are dependent but uncorrelated")savePlot("treasure.png")`

The last two are classic examples: X and Y are normally distributed, but (X, Y) is not a bivariate normal.

I’ll admit that the two exponentials are a bit counterintuitive to me, at least visually. (They’re in the second plot from the top, which looks vaguely like a B-2.) The variables are independent; if you regressed Y on X you’d end up with a flat line. Yet, somehow, if I were to look at that plot without knowing how the variables were generated, I’d want to draw a diagonal line pointing up and to the right. If anything, it goes to show that I should probably not run regressions “by inspection.”

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

## Recent popular posts

Contact us if you wish to help support R-bloggers, and place your banner here.

# Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts.(You will not see this message again.)

Click here to close (This popup will not appear again)