# (nonparametric) Copula density estimation

September 20, 2012
By

(This article was first published on Freakonometrics - Tag - R-english, and kindly contributed to R-bloggers)

Today, we will go further on the inference of copula functions. Some codes (and references) can be found on a previous post, on nonparametric estimators of copula densities (among other related things).  Consider (as before) the loss-ALAE dataset (since we’ve been working a lot on that dataset)

> library(MASS)
> library(evd)
> X=lossalae
> U=cbind(rank(X[,1])/(nrow(X)+1),rank(X[,2])/(nrow(X)+1))

The standard tool to plot nonparametric estimators of densities is to use multivariate kernels. We can look at the density using

> mat1=kde2d(U[,1],U[,2],n=35)
> persp(mat1$x,mat1$y,mat1$z,col="green", + shade=TRUE,theta=s*5, + xlab="",ylab="",zlab="",zlim=c(0,7)) or level curves (isodensity curves) with more detailed estimators (on grids with shorter steps) > mat1=kde2d(U[,1],U[,2],n=101) > image(mat1$x,mat1$y,mat1$z,col=
+ rev(heat.colors(100)),xlab="",ylab="")
> contour(mat1$x,mat1$y,mat1$z,add= + TRUE,levels = pretty(c(0,4), 11)) Kernels are nice, but we clearly observe some border bias, extremely strong in corners (the estimator is 1/4th of what it should be, see another post for more details). Instead of working on sample $(U_i,V_i)$ on the unit square, consider some transformed sample $(Q(U_i),Q(V_i))$, where $Q:(0,1)\rightarrow\mathbb{R}$ is a given function. E.g. a quantile function of an unbounded distribution, for instance the quantile function of the $\mathcal{N}(0,1)$ distribution. Then, we can estimate the density of the transformed sample, and using the inversion technique, derive an estimator of the density of the initial sample. Since the inverse of a (general) function is not that simple to compute, the code might be a bit slow. But it does work, > gaussian.kernel.copula.surface <- function (u,v,n) { + s=seq(1/(n+1), length=n, by=1/(n+1)) + mat=matrix(NA,nrow = n, ncol = n) + sur=kde2d(qnorm(u),qnorm(v),n=1000, + lims = c(-4, 4, -4, 4)) + su<-sur$z
+ for (i in 1:n) {
+     for (j in 1:n) {
+ 	Xi<-round((qnorm(s[i])+4)*1000/8)+1;
+ 	Yj<-round((qnorm(s[j])+4)*1000/8)+1
+ 	mat[i,j]<-su[Xi,Yj]/(dnorm(qnorm(s[i]))*
+ 	dnorm(qnorm(s[j])))
+     }
+ }
+ return(list(x=s,y=s,z=data.matrix(mat)))
+ }

Here, we get

Note that it is possible to consider another transformation, e.g. the quantile function of a Student-t distribution.

> student.kernel.copula.surface =
+  function (u,v,n,d=4) {
+  s <- seq(1/(n+1), length=n, by=1/(n+1))
+  mat <- matrix(NA,nrow = n, ncol = n)
+ sur<-kde2d(qt(u,df=d),qt(v,df=d),n=5000,
+ lims = c(-8, 8, -8, 8))
+ su<-sur\$z
+ for (i in 1:n) {
+     for (j in 1:n) {
+ 	Xi<-round((qt(s[i],df=d)+8)*5000/16)+1;
+ 	Yj<-round((qt(s[j],df=d)+8)*5000/16)+1
+ 	mat[i,j]<-su[Xi,Yj]/(dt(qt(s[i],df=d),df=d)*
+ 	dt(qt(s[j],df=d),df=d))
+     }
+ }
+ return(list(x=s,y=s,z=data.matrix(mat)))
+ }

Another strategy is to consider kernel that have precisely the unit interval as support. The idea is here to consider the product of Beta kernels, where parameters depend on the location

> beta.kernel.copula.surface=
+  function (u,v,bx=.025,by=.025,n) {
+  s <- seq(1/(n+1), length=n, by=1/(n+1))
+  mat <- matrix(0,nrow = n, ncol = n)
+ for (i in 1:n) {
+     a <- s[i]
+     for (j in 1:n) {
+     b <- s[j]
+ 	mat[i,j] <- sum(dbeta(a,u/bx,(1-u)/bx) *
+     dbeta(b,v/by,(1-v)/by)) / length(u)
+     }
+ }
+ return(list(x=s,y=s,z=data.matrix(mat)))
+ }

On those two graphs, we can clearly observe strong tail dependence in the upper (right) corner, that cannot be intuited using a standard kernel estimator…

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

Tags: , , , , , , , , , , , , ,