Conditional dependence measures

December 17, 2013
By

(This article was first published on Freakonometrics » R-english, and kindly contributed to R-bloggers)

This week, I spend some time at the Workshop on Nonparametric Curve Smoothing conference at Concordia. Yesterday afternoon, Noël Veraverbeke show an interesting graph, to illustrate conditional copulas (and the derivation of conditional dependence measures, such as Kendall’s tau, or Spearman’s rho). A long time ago, in my PhD thesis (mainly on conditional copulas) I did try to derive conditional dependence measures (in a dedicated chapter). In my PhD, I was interested to describe the dependence of a pair http://latex.codecogs.com/gif.latex?(Y_1,Y_2) given http://latex.codecogs.com/gif.latex?(Y_1,Y_2)\in\mathcal{V}, where http://latex.codecogs.com/gif.latex?\mathcal%20V is a region of interest, such has tails. So I wanted to study the behavior of http://latex.codecogs.com/gif.latex?(Y_1,Y_2) given http://latex.codecogs.com/gif.latex?\{Y_1%3Et,Y_2%3Et\}. This has interpretation when studying large risks, but also in joint life mortality.

In the paper Noël mentioned, they want to describe the dependence of a pair http://latex.codecogs.com/gif.latex?(Y_1,Y_2) given a covariate http://latex.codecogs.com/gif.latex?X=x. And he came up with this very nice example: consider expected lifetimes, for male and female, in various countries. You can get zipped files with data for male, female and we can use the GPD per capita as our covariate. Here is the code to visualize life expectancies,

b1=read.table("sp.dyn.le00.fe.in_Indicator_en_csv_v2.csv",header=TRUE,sep=",",skip=2)
b2=read.table("sp.dyn.le00.ma.in_Indicator_en_csv_v2.csv",header=TRUE,sep=",",skip=2)
b3=read.table("ny.gdp.pcap.cd_Indicator_en_csv_v2.csv",header=TRUE,sep=",",skip=2)
b1b=b1[,c(1,2,55)]
b2b=b2[,c(1,2,55)]
b3b=b3[,c(1,2,55)]
names(b1b)[3]="LEF"
names(b2b)[3]="LEM"
names(b3b)[3]="GPD"
b=merge(b1b,b2b)
b=merge(b,b3b)
plot(b$LEM,b$LEF,xlab="Life Expectancy (male vs. female)")

With this graph, we cannot visualize the link with the covariate,

b$cgpd=cut(b$GPD,quantile(b$GPD,seq(0,1,by=1/6),na.rm=TRUE))
levels(b$cgpd)=as.character(1:6)
library(RColorBrewer)
CL=brewer.pal(6, "RdBu")	
plot(b$LEM,b$LEF,xlab="Life Expectancy (male vs. female)",pch=19,col=CL[as.numeric(b$cgpd)])

Here, poor countries are in red, and rich countries in blue,

Clearly, life expectancy is connected to the wealth of the country,

plot(b$GPD,b$LEF,xlab="(Female) Life Expectancy vs. GPD (log scale)",pch=19,col=CL[as.numeric(b$cgpd)],log="x")
plot(b$GPD,b$LEM,xlab="(Male) Life Expectancy vs. GPD (log scale)",pch=19,col=CL[as.numeric(b$cgpd)],log="x")

The idea here is to consider the conditional dependence structure, given the wealth. If we want something smooth (this is actually the goal of the workshop, but I’d like to make that quickly) consider some weighted version of Kendall’s tau, based on the idea mentioned in a post on http://stackoverflow.com/

The idea is to use concordance and discordance counts, with replications of the data, based on the weights

P = function(t) {   
  r_ndx = row(t)
  c_ndx = col(t)
  sum(t * mapply(function(r, c){sum(t[(r_ndx > r) & (c_ndx > c)])},
    r = r_ndx, c = c_ndx))}
Q = function(t) {
  r_ndx = row(t)
  c_ndx = col(t)
  sum(t * mapply( function(r, c){
      sum(t[(r_ndx > r) & (c_ndx < c)])
  },
    r = r_ndx, c = c_ndx) )
}
kendall_tau_c = function(t){
    t = as.matrix(t) 
    m = min(dim(t))
    n = sum(t)
    ks_tauc = (m*2*(P(t)-Q(t)))/((n*n)*(m-1))
}
I=is.na(b$GPD)
bw=density(log(b$GPD[!I]))$bw
kendall.weight=function(x){
df=data.frame(Y1=b$LEF, Y2=b$LEM, freq=trunc(dnorm(log(b$GPD)-log(x),sd=bw)*100))
df=df[!is.na(df$freq),]
dfrep=data.frame( lapply(df, function(x){rep(x, df$freq)}))
t=xtabs(~ Y1+Y2, dfrep)
return(kendall_tau_c(t))}

Here, I use weights using some Gaussian kernel on the logarithm of the GPD per capita (my standard deviation for the Gaussian weight being equal to the bandwidth of the Gaussian kernel of the density of the log of the GPD per capita), then, we can compute various conditional Kendall’s tau,

T=exp(seq(6,11.5,length=50))
K=Vectorize(kendall.weight)(T)

and plot them,

plot(T,K,type="l",xlab="Conditional Kendall's tau vs. GPD (log scale)")

There is more “correlation” between lifetimes of men and women in poor countries than rich country (which is also what Noël observed). Now, we can also play with time, because we have those statistics for several years.

To leave a comment for the author, please follow the link and comment on his blog: Freakonometrics » R-english.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.