# arbitrary distributions with set correlation

May 10, 2015
By

(This article was first published on Xi'an's Og » R, and kindly contributed to R-bloggers)

A question recently posted on X Validated by Antoni Parrelada: given two arbitrary cdfs F and G, how can we simulate a pair (X,Y) with marginals  F and G, and with set correlation ρ? The answer posted by Antoni Parrelada was to reproduce the Gaussian copula solution: produce (X’,Y’) as a Gaussian bivariate vector with correlation ρ and then turn it into (X,Y)=(F⁻¹(Φ(X’)),G⁻¹(Φ(Y’))). Unfortunately, this does not work, because the correlation does not keep under the double transform. The graph above is part of my answer for a χ² and a log-Normal cdf for F amd G: while corr(X’,Y’)=ρ, corr(X,Y) drifts quite a  lot from the diagonal! Actually, by playing long enough with my function

```tacor=function(rho=0,nsim=1e4,fx=qnorm,fy=qnorm)
{
x1=rnorm(nsim);x2=rnorm(nsim)
coeur=rho
rho2=sqrt(1-rho^2)
for (t in 1:length(rho)){
y=pnorm(cbind(x1,rho[t]*x1+rho2[t]*x2))
coeur[t]=cor(fx(y[,1]),fy(y[,2]))}
return(coeur)
}
```

Playing further, I managed to get an almost flat correlation graph for the admittedly convoluted call

```tacor(seq(-1,1,.01),
fx=function(x) qchisq(x^59,df=.01),
fy=function(x) qlogis(x^59))
```

Now, the most interesting question is how to produce correlated simulations. A pedestrian way is to start with a copula, e.g. the above Gaussian copula, and to twist the correlation coefficient ρ of the copula until the desired correlation is attained for the transformed pair. That is, to draw the above curve and invert it. (Note that, as clearly exhibited by the graph just above, all desired correlations cannot be achieved for arbitrary cdfs F and G.) This is however very pedestrian and I wonder whether or not there is a generic and somewhat automated solution…

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...