# Easily generate correlated variables from any distribution

February 27, 2014
By

(This article was first published on Econometrics by Simulation, and kindly contributed to R-bloggers)

In this post I will demonstrate in R how to draw correlated random variables from any distribution

The idea is simple.
1. Draw any number of variables from a joint normal distribution. 2. Apply the univariate normal CDF of variables to derive probabilities for each variable. 3. Finally apply the inverse CDF of any distribution to simulate draws from that distribution.
The results is that the final variables are correlated in a similar manner to that of the original variables.  This is because the rank order of the variables in maintained and thus correlations are approximately the same though not exact. This methods follows a method I presented in a previous post coded in Stata.  I am not aware of anybody else proposing this method previously.

For example:

`library(MASS) # We will use the command mvrnorm to draw a matrix of variables # Let's keep it simple, mu <- rep(0,4)Sigma <- matrix(.7, nrow=4, ncol=4) + diag(4)*.3 rawvars <- mvrnorm(n=10000, mu=mu, Sigma=Sigma) cov(rawvars); cor(rawvars)# We can see our normal sample produces results very similar to our `
`#specified covariance levels. # No lets transform some variablespvars <- pnorm(rawvars) # Through this process we already have cov(pvars); cor(pvars)# We can see that while the covariances have dropped significantly, `
`# the simply correlations are largely the same. plot(rawvars[,1], pvars[,2], main="Normal of Var 1 with probabilities of Var 2") `
` `
`# Things are looking pretty well so far.  Let's see what happens when we invert `
`#different CDFs. # To generate correlated poissonpoisvars <- qpois(pvars, 5)cor(poisvars, rawvars) # This matrix presents the correlation between the original values generated# and the tranformed poisson variables.  We can see that the correlation matrix# is very similar though somewhat "downward biased".  This is because any# transformation away from the original will reduce the correlation between the# variables. plot(poisvars,rawvars, main="Poisson Transformation Against Normal Values") `
` `
`# Perhaps the poisson count distribution is not exotic enough. # Perhaps a binomial distribution with 3 draws at 25% eachbinomvars <- qpois(1-pvars, 3, .25)   # Note, I did 1-p because p is defined differently for the qpois for some `
`#reasoncor(binomvars, rawvars)  # Or the exponential distributionexpvars <- qexp(pvars)cor(expvars, rawvars)# We can see that the correlations after the exponential tranformations are# significantly weaker (from .7 to .63) but still good representations if# we are interested in simulating correlations between a normal and exponential# variables. plot(expvars,rawvars, main="Exponential Transformation Against Normal Values") `

`# To make things a little more interesting, let's now transform our probabilities# into skewed normal distributions.library(sn)sknormvars <- qsn(pvars, 5, 2, 5)cor(expvars, rawvars) hist(sknormvars, breaks=20) `
`# Finally in order to demonstrate what we can do let's combine our variables into# a single matrix. combvar <- dataframe(sknormvars[,1], poisvars[,2], binomvars[,3], expvars[,4]) cor(combvar)#          [,1]      [,2]      [,3]      [,4]#[1,] 1.0000000 0.6853314 0.6826398 0.6256086#[2,] 0.6853314 1.0000000 0.6748458 0.6402233#[3,] 0.6826398 0.6748458 1.0000000 0.6325102#[4,] 0.6256086 0.6402233 0.6325102 1.0000000 # I am going to try to get all of these dirstributions on the same graphstdcombvar <- t(t(combvar)-apply(combvar,2,min))stdcombvar <- t(t(stdcombvar)/apply(stdcombvar,2,max))summary(stdcombvar) plotter <- data.frame(  values = c(stdcombvar),  rawnorm = rep(rawvars[,1], 4),  type = rep(c("skewed normal",              "poisson",              "binomial",              "exponential"),              each=10000)) library(ggplot2)  `
`  ggplot(plotter, aes(x=rawnorm ,y=values, color=type)) +  geom_point(shape=1) +     # Use hollow circles  geom_smooth(method=lm,    # Add linear regression line              se=FALSE)  `
` `

` `

Formatted by Pretty R at inside-R.org

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...