(nonparametric) Copula density estimation
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Today, we will go further on the inference of copula functions. Some codes (and references) can be found on a previous post, on nonparametric estimators of copula densities (among other related things). Consider (as before) the loss-ALAE dataset (since we’ve been working a lot on that dataset)
> library(MASS) > library(evd) > X=lossalae > U=cbind(rank(X[,1])/(nrow(X)+1),rank(X[,2])/(nrow(X)+1))
The standard tool to plot nonparametric estimators of densities is to use multivariate kernels. We can look at the density using
> mat1=kde2d(U[,1],U[,2],n=35) > persp(mat1$x,mat1$y,mat1$z,col="green", + shade=TRUE,theta=s*5, + xlab="",ylab="",zlab="",zlim=c(0,7))
or level curves (isodensity curves) with more detailed estimators (on grids with shorter steps)
> mat1=kde2d(U[,1],U[,2],n=101) > image(mat1$x,mat1$y,mat1$z,col= + rev(heat.colors(100)),xlab="",ylab="") > contour(mat1$x,mat1$y,mat1$z,add= + TRUE,levels = pretty(c(0,4), 11))
Kernels are nice, but we clearly observe some border bias, extremely strong in corners (the estimator is 1/4th of what it should be, see another post for more details). Instead of working on sample on the unit square, consider some transformed sample , where is a given function. E.g. a quantile function of an unbounded distribution, for instance the quantile function of the distribution. Then, we can estimate the density of the transformed sample, and using the inversion technique, derive an estimator of the density of the initial sample. Since the inverse of a (general) function is not that simple to compute, the code might be a bit slow. But it does work,
> gaussian.kernel.copula.surface <- function (u,v,n) { + s=seq(1/(n+1), length=n, by=1/(n+1)) + mat=matrix(NA,nrow = n, ncol = n) + sur=kde2d(qnorm(u),qnorm(v),n=1000, + lims = c(-4, 4, -4, 4)) + su<-sur$z + for (i in 1:n) { + for (j in 1:n) { + Xi<-round((qnorm(s[i])+4)*1000/8)+1; + Yj<-round((qnorm(s[j])+4)*1000/8)+1 + mat[i,j]<-su[Xi,Yj]/(dnorm(qnorm(s[i]))* + dnorm(qnorm(s[j]))) + } + } + return(list(x=s,y=s,z=data.matrix(mat))) + }
Here, we get
Note that it is possible to consider another transformation, e.g. the quantile function of a Student-t distribution.
> student.kernel.copula.surface = + function (u,v,n,d=4) { + s <- seq(1/(n+1), length=n, by=1/(n+1)) + mat <- matrix(NA,nrow = n, ncol = n) + sur<-kde2d(qt(u,df=d),qt(v,df=d),n=5000, + lims = c(-8, 8, -8, 8)) + su<-sur$z + for (i in 1:n) { + for (j in 1:n) { + Xi<-round((qt(s[i],df=d)+8)*5000/16)+1; + Yj<-round((qt(s[j],df=d)+8)*5000/16)+1 + mat[i,j]<-su[Xi,Yj]/(dt(qt(s[i],df=d),df=d)* + dt(qt(s[j],df=d),df=d)) + } + } + return(list(x=s,y=s,z=data.matrix(mat))) + }
Another strategy is to consider kernel that have precisely the unit interval as support. The idea is here to consider the product of Beta kernels, where parameters depend on the location
> beta.kernel.copula.surface= + function (u,v,bx=.025,by=.025,n) { + s <- seq(1/(n+1), length=n, by=1/(n+1)) + mat <- matrix(0,nrow = n, ncol = n) + for (i in 1:n) { + a <- s[i] + for (j in 1:n) { + b <- s[j] + mat[i,j] <- sum(dbeta(a,u/bx,(1-u)/bx) * + dbeta(b,v/by,(1-v)/by)) / length(u) + } + } + return(list(x=s,y=s,z=data.matrix(mat))) + }
On those two graphs, we can clearly observe strong tail dependence in the upper (right) corner, that cannot be intuited using a standard kernel estimator...
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.