(nonparametric) Copula density estimation

[This article was first published on Freakonometrics - Tag - R-english, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Today, we will go further on the inference of copula functions. Some codes (and references) can be found on a previous post, on nonparametric estimators of copula densities (among other related things).  Consider (as before) the loss-ALAE dataset (since we’ve been working a lot on that dataset)

> library(MASS)
> library(evd)
> X=lossalae
> U=cbind(rank(X[,1])/(nrow(X)+1),rank(X[,2])/(nrow(X)+1))

The standard tool to plot nonparametric estimators of densities is to use multivariate kernels. We can look at the density using

> mat1=kde2d(U[,1],U[,2],n=35)
> persp(mat1$x,mat1$y,mat1$z,col="green",
+ shade=TRUE,theta=s*5,
+ xlab="",ylab="",zlab="",zlim=c(0,7))

or level curves (isodensity curves) with more detailed estimators (on grids with shorter steps)

> mat1=kde2d(U[,1],U[,2],n=101)
> image(mat1$x,mat1$y,mat1$z,col=
+ rev(heat.colors(100)),xlab="",ylab="")
> contour(mat1$x,mat1$y,mat1$z,add=
+ TRUE,levels = pretty(c(0,4), 11))

https://i2.wp.com/freakonometrics.blog.free.fr/public/perso6/3dcop-est1.gif?w=578

Kernels are nice, but we clearly observe some border bias, extremely strong in corners (the estimator is 1/4th of what it should be, see another post for more details). Instead of working on sample http://latex.codecogs.com/gif.latex?(U_i,V_i) on the unit square, consider some transformed sample http://latex.codecogs.com/gif.latex?(Q(U_i),Q(V_i)), where http://latex.codecogs.com/gif.latex?Q:(0,1)\rightarrow\mathbb{R} is a given function. E.g. a quantile function of an unbounded distribution, for instance the quantile function of the http://latex.codecogs.com/gif.latex?\mathcal{N}(0,1) distribution. Then, we can estimate the density of the transformed sample, and using the inversion technique, derive an estimator of the density of the initial sample. Since the inverse of a (general) function is not that simple to compute, the code might be a bit slow. But it does work,

> gaussian.kernel.copula.surface <- function (u,v,n) {
+   s=seq(1/(n+1), length=n, by=1/(n+1))
+   mat=matrix(NA,nrow = n, ncol = n)
+ sur=kde2d(qnorm(u),qnorm(v),n=1000,
+ lims = c(-4, 4, -4, 4))
+ su<-sur$z
+ for (i in 1:n) {
+     for (j in 1:n) {
+ 	Xi<-round((qnorm(s[i])+4)*1000/8)+1;
+ 	Yj<-round((qnorm(s[j])+4)*1000/8)+1
+ 	mat[i,j]<-su[Xi,Yj]/(dnorm(qnorm(s[i]))*
+ 	dnorm(qnorm(s[j])))
+     }
+ }
+ return(list(x=s,y=s,z=data.matrix(mat)))
+ }

Here, we get

https://i2.wp.com/freakonometrics.blog.free.fr/public/perso6/3dcop-est2.gif?w=578

Note that it is possible to consider another transformation, e.g. the quantile function of a Student-t distribution.

> student.kernel.copula.surface =
+  function (u,v,n,d=4) {
+  s <- seq(1/(n+1), length=n, by=1/(n+1))
+  mat <- matrix(NA,nrow = n, ncol = n)
+ sur<-kde2d(qt(u,df=d),qt(v,df=d),n=5000,
+ lims = c(-8, 8, -8, 8))
+ su<-sur$z
+ for (i in 1:n) {
+     for (j in 1:n) {
+ 	Xi<-round((qt(s[i],df=d)+8)*5000/16)+1;
+ 	Yj<-round((qt(s[j],df=d)+8)*5000/16)+1
+ 	mat[i,j]<-su[Xi,Yj]/(dt(qt(s[i],df=d),df=d)*
+ 	dt(qt(s[j],df=d),df=d))
+     }
+ }
+ return(list(x=s,y=s,z=data.matrix(mat)))
+ }

Another strategy is to consider kernel that have precisely the unit interval as support. The idea is here to consider the product of Beta kernels, where parameters depend on the location

> beta.kernel.copula.surface=
+  function (u,v,bx=.025,by=.025,n) {
+  s <- seq(1/(n+1), length=n, by=1/(n+1))
+  mat <- matrix(0,nrow = n, ncol = n)
+ for (i in 1:n) {
+     a <- s[i]
+     for (j in 1:n) {
+     b <- s[j]
+ 	mat[i,j] <- sum(dbeta(a,u/bx,(1-u)/bx) *
+     dbeta(b,v/by,(1-v)/by)) / length(u)
+     }
+ }
+ return(list(x=s,y=s,z=data.matrix(mat)))
+ }

https://i2.wp.com/freakonometrics.blog.free.fr/public/perso6/3dcop-est3.gif?w=578

On those two graphs, we can clearly observe strong tail dependence in the upper (right) corner, that cannot be intuited using a standard kernel estimator...

To leave a comment for the author, please follow the link and comment on their blog: Freakonometrics - Tag - R-english.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)