Conditional dependence measures

[This article was first published on Freakonometrics » R-english, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This week, I spend some time at the Workshop on Nonparametric Curve Smoothing conference at Concordia. Yesterday afternoon, Noël Veraverbeke show an interesting graph, to illustrate conditional copulas (and the derivation of conditional dependence measures, such as Kendall’s tau, or Spearman’s rho). A long time ago, in my PhD thesis (mainly on conditional copulas) I did try to derive conditional dependence measures (in a dedicated chapter). In my PhD, I was interested to describe the dependence of a pair http://latex.codecogs.com/gif.latex?(Y_1,Y_2) given http://latex.codecogs.com/gif.latex?(Y_1,Y_2)\in\mathcal{V}, where http://latex.codecogs.com/gif.latex?\mathcal%20V is a region of interest, such has tails. So I wanted to study the behavior of http://latex.codecogs.com/gif.latex?(Y_1,Y_2) given http://latex.codecogs.com/gif.latex?\{Y_1%3Et,Y_2%3Et\}. This has interpretation when studying large risks, but also in joint life mortality.

In the paper Noël mentioned, they want to describe the dependence of a pair http://latex.codecogs.com/gif.latex?(Y_1,Y_2) given a covariate http://latex.codecogs.com/gif.latex?X=x. And he came up with this very nice example: consider expected lifetimes, for male and female, in various countries. You can get zipped files with data for male, female and we can use the GPD per capita as our covariate. Here is the code to visualize life expectancies,

b1=read.table("sp.dyn.le00.fe.in_Indicator_en_csv_v2.csv",header=TRUE,sep=",",skip=2)
b2=read.table("sp.dyn.le00.ma.in_Indicator_en_csv_v2.csv",header=TRUE,sep=",",skip=2)
b3=read.table("ny.gdp.pcap.cd_Indicator_en_csv_v2.csv",header=TRUE,sep=",",skip=2)
b1b=b1[,c(1,2,55)]
b2b=b2[,c(1,2,55)]
b3b=b3[,c(1,2,55)]
names(b1b)[3]="LEF"
names(b2b)[3]="LEM"
names(b3b)[3]="GPD"
b=merge(b1b,b2b)
b=merge(b,b3b)
plot(b$LEM,b$LEF,xlab="Life Expectancy (male vs. female)")

With this graph, we cannot visualize the link with the covariate,

b$cgpd=cut(b$GPD,quantile(b$GPD,seq(0,1,by=1/6),na.rm=TRUE))
levels(b$cgpd)=as.character(1:6)
library(RColorBrewer)
CL=brewer.pal(6, "RdBu")	
plot(b$LEM,b$LEF,xlab="Life Expectancy (male vs. female)",pch=19,col=CL[as.numeric(b$cgpd)])

Here, poor countries are in red, and rich countries in blue,

Clearly, life expectancy is connected to the wealth of the country,

plot(b$GPD,b$LEF,xlab="(Female) Life Expectancy vs. GPD (log scale)",pch=19,col=CL[as.numeric(b$cgpd)],log="x")
plot(b$GPD,b$LEM,xlab="(Male) Life Expectancy vs. GPD (log scale)",pch=19,col=CL[as.numeric(b$cgpd)],log="x")

The idea here is to consider the conditional dependence structure, given the wealth. If we want something smooth (this is actually the goal of the workshop, but I’d like to make that quickly) consider some weighted version of Kendall’s tau, based on the idea mentioned in a post on http://stackoverflow.com/

The idea is to use concordance and discordance counts, with replications of the data, based on the weights

P = function(t) {   
  r_ndx = row(t)
  c_ndx = col(t)
  sum(t * mapply(function(r, c){sum(t[(r_ndx > r) & (c_ndx > c)])},
    r = r_ndx, c = c_ndx))}
Q = function(t) {
  r_ndx = row(t)
  c_ndx = col(t)
  sum(t * mapply( function(r, c){
      sum(t[(r_ndx > r) & (c_ndx < c)])
  },
    r = r_ndx, c = c_ndx) )
}
kendall_tau_c = function(t){
    t = as.matrix(t) 
    m = min(dim(t))
    n = sum(t)
    ks_tauc = (m*2*(P(t)-Q(t)))/((n*n)*(m-1))
}
I=is.na(b$GPD)
bw=density(log(b$GPD[!I]))$bw
kendall.weight=function(x){
df=data.frame(Y1=b$LEF, Y2=b$LEM, freq=trunc(dnorm(log(b$GPD)-log(x),sd=bw)*100))
df=df[!is.na(df$freq),]
dfrep=data.frame( lapply(df, function(x){rep(x, df$freq)}))
t=xtabs(~ Y1+Y2, dfrep)
return(kendall_tau_c(t))}

Here, I use weights using some Gaussian kernel on the logarithm of the GPD per capita (my standard deviation for the Gaussian weight being equal to the bandwidth of the Gaussian kernel of the density of the log of the GPD per capita), then, we can compute various conditional Kendall’s tau,

T=exp(seq(6,11.5,length=50))
K=Vectorize(kendall.weight)(T)

and plot them,

plot(T,K,type="l",xlab="Conditional Kendall's tau vs. GPD (log scale)")

There is more “correlation” between lifetimes of men and women in poor countries than rich country (which is also what Noël observed). Now, we can also play with time, because we have those statistics for several years.

To leave a comment for the author, please follow the link and comment on their blog: Freakonometrics » R-english.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)