(nonparametric) Copula density estimation

September 20, 2012

(This article was first published on Freakonometrics - Tag - R-english, and kindly contributed to R-bloggers)

Today, we will go further on the inference of copula functions. Some codes (and references) can be found on a previous post, on nonparametric estimators of copula densities (among other related things).  Consider (as before) the loss-ALAE dataset (since we’ve been working a lot on that dataset)

> library(MASS)
> library(evd)
> X=lossalae
> U=cbind(rank(X[,1])/(nrow(X)+1),rank(X[,2])/(nrow(X)+1))

The standard tool to plot nonparametric estimators of densities is to use multivariate kernels. We can look at the density using

> mat1=kde2d(U[,1],U[,2],n=35)
> persp(mat1$x,mat1$y,mat1$z,col="green",
+ shade=TRUE,theta=s*5,
+ xlab="",ylab="",zlab="",zlim=c(0,7))

or level curves (isodensity curves) with more detailed estimators (on grids with shorter steps)

> mat1=kde2d(U[,1],U[,2],n=101)
> image(mat1$x,mat1$y,mat1$z,col=
+ rev(heat.colors(100)),xlab="",ylab="")
> contour(mat1$x,mat1$y,mat1$z,add=
+ TRUE,levels = pretty(c(0,4), 11))


Kernels are nice, but we clearly observe some border bias, extremely strong in corners (the estimator is 1/4th of what it should be, see another post for more details). Instead of working on sample http://latex.codecogs.com/gif.latex?(U_i,V_i) on the unit square, consider some transformed sample http://latex.codecogs.com/gif.latex?(Q(U_i),Q(V_i)), where http://latex.codecogs.com/gif.latex?Q:(0,1)\rightarrow\mathbb{R} is a given function. E.g. a quantile function of an unbounded distribution, for instance the quantile function of the http://latex.codecogs.com/gif.latex?\mathcal{N}(0,1) distribution. Then, we can estimate the density of the transformed sample, and using the inversion technique, derive an estimator of the density of the initial sample. Since the inverse of a (general) function is not that simple to compute, the code might be a bit slow. But it does work,

> gaussian.kernel.copula.surface <- function (u,v,n) {
+   s=seq(1/(n+1), length=n, by=1/(n+1))
+   mat=matrix(NA,nrow = n, ncol = n)
+ sur=kde2d(qnorm(u),qnorm(v),n=1000,
+ lims = c(-4, 4, -4, 4))
+ su<-sur$z
+ for (i in 1:n) {
+     for (j in 1:n) {
+ 	Xi<-round((qnorm(s[i])+4)*1000/8)+1;
+ 	Yj<-round((qnorm(s[j])+4)*1000/8)+1
+ 	mat[i,j]<-su[Xi,Yj]/(dnorm(qnorm(s[i]))*
+ 	dnorm(qnorm(s[j])))
+     }
+ }
+ return(list(x=s,y=s,z=data.matrix(mat)))
+ }

Here, we get


Note that it is possible to consider another transformation, e.g. the quantile function of a Student-t distribution.

> student.kernel.copula.surface =
+  function (u,v,n,d=4) {
+  s <- seq(1/(n+1), length=n, by=1/(n+1))
+  mat <- matrix(NA,nrow = n, ncol = n)
+ sur<-kde2d(qt(u,df=d),qt(v,df=d),n=5000,
+ lims = c(-8, 8, -8, 8))
+ su<-sur$z
+ for (i in 1:n) {
+     for (j in 1:n) {
+ 	Xi<-round((qt(s[i],df=d)+8)*5000/16)+1;
+ 	Yj<-round((qt(s[j],df=d)+8)*5000/16)+1
+ 	mat[i,j]<-su[Xi,Yj]/(dt(qt(s[i],df=d),df=d)*
+ 	dt(qt(s[j],df=d),df=d))
+     }
+ }
+ return(list(x=s,y=s,z=data.matrix(mat)))
+ }

Another strategy is to consider kernel that have precisely the unit interval as support. The idea is here to consider the product of Beta kernels, where parameters depend on the location

> beta.kernel.copula.surface=
+  function (u,v,bx=.025,by=.025,n) {
+  s <- seq(1/(n+1), length=n, by=1/(n+1))
+  mat <- matrix(0,nrow = n, ncol = n)
+ for (i in 1:n) {
+     a <- s[i]
+     for (j in 1:n) {
+     b <- s[j]
+ 	mat[i,j] <- sum(dbeta(a,u/bx,(1-u)/bx) *
+     dbeta(b,v/by,(1-v)/by)) / length(u)
+     }
+ }
+ return(list(x=s,y=s,z=data.matrix(mat)))
+ }


On those two graphs, we can clearly observe strong tail dependence in the upper (right) corner, that cannot be intuited using a standard kernel estimator…

To leave a comment for the author, please follow the link and comment on their blog: Freakonometrics - Tag - R-english.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , , , , , , , , , , , ,

Comments are closed.


Mango solutions

plotly webpage

dominolab webpage

Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training





CRC R books series

Six Sigma Online Training

Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)