Confidence Regions for Parameters in the Simplex

January 18, 2016
By

(This article was first published on R-english – Freakonometrics, and kindly contributed to R-bloggers)

Consider here the case where, in some parametric inference problem, parameter  is a point in the Simplex,

For instance, consider some regression, on compositional data,

> library(compositions)
>  data(DiagnosticProb)
>  Y=DiagnosticProb[,"type"]-1
>  X=DiagnosticProb[,c("A","B","C")]
>  model = glm(Y~ilr(X),family=binomial)
>  b = ilrInv(coef(model)[-1],orig=X)
>  as.numeric(b)
[1] 0.3447106 0.2374977 0.4177917

We can visualize that estimator on the simplex, using

>  tripoint=function(s){
+    p=s/sum(s)
+    abc2xy(matrix(p,1,3))
+  }

>  lab=LETTERS[1:3]
>  xl=c(-.1,1.25)
>  yl=c(-.1,1.15)
>  library(trifield)
>  A=abc2xy(matrix(c(1,0,0),1,3)) 
>  B=abc2xy(matrix(c(0,1,0),1,3))
>  C=abc2xy(matrix(c(0,0,1),1,3)) 
>  plot(0:1,0:1,col="white",
+  xlim=xl,ylim=yl,xlab="",ylab="",axes=FALSE)
>  polygon(rbind(A,B,C),col="light yellow")
>  text(B[1],-.05,lab[2])
>  text(A[1],1.05,lab[1])
>  text(C[1],-.05,lab[3])
>  segments((A[1]+C[1])/2,(A[2]+C[2])/2,B[1],B[2],col="grey",lty=2)
>  segments((A[1]+B[1])/2,(A[2]+B[2])/2,C[1],C[2],col="grey",lty=2)
>  segments((B[1]+C[1])/2,(B[2]+C[2])/2,A[1],A[2],col="grey",lty=2)
>  points(tripoint(b),pch=19,cex=2,col="red")

If we want to compute a ‘confidence region’, we can either use Bayesian models (with a Dirichlet distribution as prior distribution), or use bootstrap. We will use here the second idea

>  MB=matrix(NA,1e4,2)
>  for(sim in 1:1e4){
+    idx=sample(1:nrow(DiagnosticProb),
+    size=nrow(DiagnosticProb),replace=TRUE)
+  Y=DiagnosticProb[idx,"type"]-1
+  X=DiagnosticProb[idx,c("A","B","C")]
+  model = glm(Y~ilr(X),family=binomial)
+  MB[sim,]=tripoint(as.numeric(
+    ilrInv(coef(model)[-1],orig=X)))}

To get some ‘confidence region’, we can then use the bagplot, to get either a region where 50% of the boostraped estimators are, or 95%,

>  library(aplpack)
> P1=bagplot(MB[,1],MB[,2], factor =1.96, cex=.9,
+ dkmethod=2,show.baghull=TRUE) 
> P2=bagplot(MB[,1],MB[,2], factor =0.67, cex=.9,
+ dkmethod=2,show.baghull=TRUE) 

Then we can easily plot those two regions,

>  plot(0:1,0:1,col="white")
>  polygon(rbind(A,B,C),col="light yellow")
>  text(B[1],-.05,lab[2])
>  text(A[1],1.05,lab[1])
>  text(C[1],-.05,lab[3])
>  polygon(P1$hull.loop,col="yellow",border=NA)
>  polygon(P2$hull.loop,col="orange",border=NA)
>  segments((A[1]+C[1])/2,(A[2]+C[2])/2,B[1],B[2],col="grey",lty=2)
>  segments((A[1]+B[1])/2,(A[2]+B[2])/2,C[1],C[2],col="grey",lty=2)
>  segments((B[1]+C[1])/2,(B[2]+C[2])/2,A[1],A[2],col="grey",lty=2)
>  points(tripoint(b),pch=19,cex=2,col="red")

 

To leave a comment for the author, please follow the link and comment on their blog: R-english – Freakonometrics.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)