In this post I will demonstrate in R how to draw correlated random variables from any distribution
The idea is simple.
1. Draw any number of variables from a joint normal distribution. 2. Apply the univariate normal CDF of variables to derive probabilities for each variable. 3. Finally apply the inverse CDF of any distribution to simulate draws from that distribution.
The results is that the final variables are correlated in a similar manner to that of the original variables. This is because the rank order of the variables in maintained and thus correlations are approximately the same though not exact. This methods follows a method I presented in a previous post coded in Stata. I am not aware of anybody else proposing this method previously.
# We will use the command mvrnorm to draw a matrix of variables
# Let's keep it simple,
mu <- rep(0,4)
Sigma <- matrix(.7, nrow=4, ncol=4) + diag(4)*.3
rawvars <- mvrnorm(n=10000, mu=mu, Sigma=Sigma)
# We can see our normal sample produces results very similar to our
#specified covariance levels.
# No lets transform some variables
pvars <- pnorm(rawvars)
# Through this process we already have
# We can see that while the covariances have dropped significantly,
# the simply correlations are largely the same.
plot(rawvars[,1], pvars[,2], main="Normal of Var 1 with probabilities of Var 2")
# Things are looking pretty well so far. Let's see what happens when we invert
# To generate correlated poisson
poisvars <- qpois(pvars, 5)
# This matrix presents the correlation between the original values generated
# and the tranformed poisson variables. We can see that the correlation matrix
# is very similar though somewhat "downward biased". This is because any
# transformation away from the original will reduce the correlation between the
plot(poisvars,rawvars, main="Poisson Transformation Against Normal Values")
# Perhaps the poisson count distribution is not exotic enough.
# Perhaps a binomial distribution with 3 draws at 25% each
binomvars <- qpois(1-pvars, 3, .25)
# Note, I did 1-p because p is defined differently for the qpois for some
# Or the exponential distribution
expvars <- qexp(pvars)
# We can see that the correlations after the exponential tranformations are
# significantly weaker (from .7 to .63) but still good representations if
# we are interested in simulating correlations between a normal and exponential
plot(expvars,rawvars, main="Exponential Transformation Against Normal Values")
# To make things a little more interesting, let's now transform our probabilities
# into skewed normal distributions.
sknormvars <- qsn(pvars, 5, 2, 5)
# Finally in order to demonstrate what we can do let's combine our variables into
# a single matrix.
combvar <- dataframe(sknormvars[,1], poisvars[,2], binomvars[,3], expvars[,4])
# [,1] [,2] [,3] [,4]
#[1,] 1.0000000 0.6853314 0.6826398 0.6256086
#[2,] 0.6853314 1.0000000 0.6748458 0.6402233
#[3,] 0.6826398 0.6748458 1.0000000 0.6325102
#[4,] 0.6256086 0.6402233 0.6325102 1.0000000
# I am going to try to get all of these dirstributions on the same graph
stdcombvar <- t(t(combvar)-apply(combvar,2,min))
stdcombvar <- t(t(stdcombvar)/apply(stdcombvar,2,max))
plotter <- data.frame(
values = c(stdcombvar),
rawnorm = rep(rawvars[,1], 4),
type = rep(c("skewed normal",
ggplot(plotter, aes(x=rawnorm ,y=values, color=type)) +
geom_point(shape=1) + # Use hollow circles
geom_smooth(method=lm, # Add linear regression line