# Modeling the Marginals and the Dependence separately

April 1, 2014
By

(This article was first published on Freakonometrics » R-english, and kindly contributed to R-bloggers)

When introducing copulas, it is commonly admitted that copulas are interesting because they allow to model the marginals and the dependence structure separately. The motivation is probably Sklar’s theorem, which says that given some marginal cumulative distribution functions (say $F$ and $G$, in dimension 2), and a copula (denoted $C$), then we can generate a multivariate cumulative distribution function with marginals the one specified previously, using

$H(x,y)=C(F(x),G(y))$

But this separability might be misleading. Consider the case of a fully parametric model,

$\left\{\begin{array}{l}F=F_{\alpha_0}\in\{F_\alpha;\alpha\in A\}\\G=G_{\beta_0}\in\{G_\beta;\beta\in B\} \\ C=C_{\gamma_0}\in\{C_\gamma;\gamma\in \Gamma\}\end{array}\right.$

Assume that those distributions are continuous, so that we can write the likelihood using densities,

$\mathcal{L}=\prod_{i=1}^n f_{\alpha}(X_i)\cdot g_{\beta}(Y_i)\cdot c_\gamma(F_{\alpha}(X_i),G_{\beta}(Y_i))$and the log-likelihood is

$\log\mathcal{L}=\underbrace{\sum_{i=1}^n \log f_{\alpha}(X_i)}_{\log\mathcal{L}(\alpha)}+\underbrace{\sum_{i=1}^n \log g_{\beta}(Y_i)}_{\log\mathcal{L}(\beta)}{\color{blue}+\sum_{i=1}^n \log c_\gamma(F_{\alpha}(X_i),G_{\beta}(Y_i))}$

The first part is the log-likelihood if we consider the first marginal (only). The second part is the log-likelihood if we consider the second marginal (only). If the two components are not independent (i.e. the copula density $c$ is not equal to 1 everywhere) the third part cannot be considered as null, and so, in a general context,

$(\alpha^\star,\beta^\star)\neq (\widehat{\alpha},\widehat{\beta})$

where

$(\alpha^\star,\beta^\star,\gamma^\star)=\text{argmax}\{\log \mathcal{L}\}$

while

$\left\{\begin{array}{l}\widehat{\alpha}=\text{argmax}\{\log \mathcal{L}(\alpha)\}\\ \widehat{\beta}=\text{argmax}\{\log \mathcal{L}(\beta)\}\end{array}\right.$

In order to illustrate this point, consider a bivariate lognormal distribution (obtained by taking the exponential of a Gaussian vector)

> mu1=1
> mu2=2
> MU=c(mu1,mu2)
> s1=1
> s2=sqrt(2)
> r=.8
> SIGMA=matrix(c(s1^2,r*s1*s2,r*s1*s2,s2^2),2,2)
> library(mnormt)
> set.seed(1)
> Z=exp(rmnorm(25,MU,SIGMA))

If we believe that marginals and correlations can be treated separately, we can start with marginal distributions.

> library(MASS)
> (p1=fitdistr(Z[,1],"lognormal"))
meanlog      sdlog
1.1686652   0.9309119
(0.1861824) (0.1316508)
> (p2=fitdistr(Z[,2],"lognormal"))
meanlog      sdlog
2.2181721   1.1684049
(0.2336810) (0.1652374)

Based on those marginal distributions, define $\tilde{U}_i=F_{\widehat{\alpha}}(X_i)$ and $\tilde{V}_i=G_{\widehat{\beta}}(Y_i)$, and consider the maximum likelihood estimator $\widehat{\gamma}}$ of the copula parameter, obtained from this pseudo sample,

$\widehat{\gamma}}=\text{argmax}\left\{\sum_{i=1}^n \log c_\gamma(\tilde U_i,\tilde V_i) \right\}$

Numerically, we get (since we consider a Gaussian copula, which is the true copula generated here)

> library(copula)
> Gcop=normalCopula(.3,dim=2)
> U=cbind(plnorm(Z[,1],p1$estimate[1],p1$estimate[2]),
+ plnorm(Z[,2],p2$estimate[1],p2$estimate[2]))
> fitCopula(Gcop,data=U,method="ml")
fitCopula() estimation based on 'maximum likelihood'
and a sample of size 25.
Estimate Std. Error z value Pr(>|z|)
rho.1  0.86530    0.03799   22.77

But clearly, we did not treat the dependence structure separately, since it was a function of marginal distributions,

$\widehat{\gamma}}=\text{argmax}\left\{\sum_{i=1}^n \log c_\gamma(F_{\widehat{\alpha}}(X_i),G_{\widehat{\beta}}(Y_i)) \right\}$

If we consider a global optimization problem, then results are different. The joint density can be derived (see e.g. Mostafa & Mahmoud (1964))

> dbivlognorm=function(x,theta){
+ mu1=theta[1]
+ mu2=theta[2]
+ s1=theta[3]
+ s2=theta[4]
+ r=theta[5]
+ a1=(log(x[,1])-mu1)/s1
+ a2=(log(x[,2])-mu2)/s2
+ d=1/(2*pi*s1*s2*sqrt(1-r^2))*1/(x[,1]*x[,2])*
+ exp(-(a1^2-2*r*a1*a2+a2^2)/(2*(1-r^2)))
+ return(d)
+ }
> LogLik=function(theta){
+ return(-sum(log(dbivlognorm(Z,theta))))}
> optim(par=c(0,0,1,1,0),fn=LogLik)$par [1] 1.1655359 2.2159767 0.9237853 1.1610132 0.8645052 The difference is not huge, but still. The estimators are not identical. From a statistical point of view, we can hardly treat the marginals and the dependence structure separately. Another point we should keep in mind is that the estimation of the copula parameter depends on the margins, not only through the parameters, but more deeply, through the choice of the marginal distributions (that might be misspecified). For instance, if we assume that margins are exponentially distributed, > (p1=fitdistr(Z[,1],"exponential")) rate 0.22288362 (0.04457672) > (p2=fitdistr(Z[,2],"exponential")) rate 0.06543665 (0.01308733) the estimation of the parameter of the Gaussian copula yields > U=cbind(pexp(Z[,1],p1$estimate[1]),
+ pexp(Z[,2],p2$estimate[1])) > fitCopula(Gcop,data=U,method="ml") fitCopula() estimation based on 'maximum likelihood' and a sample of size 25. Estimate Std. Error z value Pr(>|z|) rho.1 0.87421 0.03617 24.17 <2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 The maximized loglikelihood is 15.4 Optimization converged The problem is that since we misspecify marginal distribution, our pseudo sample is defined on the unit-interval, but there is no chance that we get uniform margins. If we generate a sample of size 500 with the code above, > x <- U[,1]; y <- U[,2] > xhist <- hist(x, plot=FALSE) ; yhist <- hist(y, plot=FALSE) > top <- max(c(xhist$counts, yhist$counts)) > nf <- layout(matrix(c(2,0,1,3),2,2,byrow=TRUE), c(3,1), c(1,3), TRUE) > par(mar=c(3,3,1,1)) > plot(x, y, xlab="", ylab="",col="red",xlim=0:1,ylim=0:1) > par(mar=c(0,3,1,1)) > barplot(xhist$counts, axes=FALSE, ylim=c(0, top),
+ space=0,col="light green")
> par(mar=c(3,0,1,1))
> barplot(yhist\$counts, axes=FALSE, xlim=c(0, top),
+ space=0, horiz=TRUE,col="light blue")

If we compare with the previous case, when marginal distribution were well-specified, we can clearly see that the dependence structure depends on marginal distributions,