[This article was first published on Statistical Reflections of a Medical Doctor » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Christian Robert reviewed on line a paper that was critical of non-informative priors. Among the points that were discussed by him and other contributors (e.g. Keith O’Rourke), was the issue of induced priors, i.e. priors which arise from a transformation of original parameters, or of observables. I found this exchange interesting because I did something similar when revisiting an old research project that had been collecting digital dust in my hard disk. The specific problem had to do with analysis of a biomarker that was measured with a qualitative technique yielding a binary classification of measurements as present or absent, in two experimental conditions (call them A and B). Ignoring some technical aspects of the study design, the goal was to calculate the odds ratio of the biomarker being expressed in condition B v.s A (the reference state signifying absence of disease).

When writing the programs for the analysis, I defaulted to the N(0.0,1.0E-6) prior that epitomizes non-informativeness in BUGS. However, one of my co-authors asked the “What the @#\$%& does this prior mean?” question. And then we stopped … and reflected on what we were about to do. You see, before the experiment started we had absolutely no prior information about the behaviour of the biomarker in either experimental state so that we did not want to commit one way or another. In other words, Laplace’s original uniform (or Beta(1,1)) prior would have been reasonable if the expression data for  A and B were to be analyzed separately. However, we wanted to analyze the data with a logistic regression model, so was the ubiquitous N(0.0,1.0E-6) the prior we were after?

The answer is a loud NO! According to Wikipedia, the mother of all knowledge, the logistic transformation of a uniform variate is the logistic distribution with location of zero and scale of 1. Hence, the prior on the intercept of the logistic regression (interpretable as the odds of the biomarker being expressed in state A) had to be a Logistic(0,1).

Surprisingly the Odds Ratio of B v.s. A was found (after trial and error and method of moments considerations) to be very well approximated by a 1:1 mixture of a logistic and a Gaussian which clearly departs from the N(0.0,1.0-6) prior we (almost) used:

Bottom line: Even informative (in the BUGS sense!) priors can be pretty non-informative in some intuitively appropriate parameterization. Conversely, one could start with a non-informative prior in a parameterization that is easier to reason about and look for an induced prior (using analytic considerations or even simulations) to convert it to a parameterization that is more appropriate to the analytic plan at hand.

(R code for the plots and simulations is given below)

## approximating uniforms
logit<-function(x) log(x/(1-x))
set.seed(1234)
N<-10000000
s<-runif(N,0,1);
s2<-runif(N,0,1);
y<-logit(s)
y2<-logit(s2)
m<-mean(y)
s<-sd(y)
x<-seq(-10,10,.1)
## logistic is logit of a uniform
hist(y,prob=TRUE,breaks=50,main="intercept",
xlab="logit(A)")
lines(x,dnorm(x,m,s),col="red")
lines(x,dlogis(x,0,1),col="blue")
legend(-15,0.20,legend=c("Normal(0,1)",
"Logistic(0,1)"),lty=1,col=c("blue","red") )

## approximating the difference of two uniforms
hist(y-y2,prob=TRUE,ylim=c(0,.25),breaks=200,
xlim=c(-10,10),main="OR between two U(0,1)",
xlab="logit(B)-logit(A)")
## logistic approximation
lines(x,dlogis(x,0,sqrt(2)),col="blue",lwd=2)
## normal
lines(x,dnorm(x,0,(pi)*sqrt(2/3)),col="red",lwd=2)
## mixture of a logistic and a normal approximation
lines(x,0.5*(dlogis(x,0,sqrt(2))+
dnorm(x,0,(pi)*sqrt(2/3))),col="green",lwd=2)
## legends
NL<-expression(paste("Normal(0,",pi*sqrt(2/3),")"))
LL<-expression(paste("Logistic(0,",sqrt(2),")"))
ML<-expression(paste("0.5 Normal(0,",pi*sqrt(2/3),")+0.5 Logistic(0,",sqrt(2),")"))
legend(-6.5,0.25,legend=c(NL,LL,ML),
lty=1,col=c("blue","red","green") )

## does it extend to more general cases?
m1<--2;m2<-2;s1<-1;s2<-2.5;
l1<-rlogis(N,m1,s1)
l2<-rlogis(N,m2,s2)
d<-l1-l2
hist(d,prob=TRUE,ylim=c(0,0.25),breaks=200)
plot(density(d))
lines(x,dlogis(x,m1-m2,sqrt(s1^2+s2^2)),col="green",lwd=2)
lines(x,dnorm(x,m1-m2,pi*sqrt((s1^2+s2^2)/3)),col="red",lwd=2)
lines(x,0.5*(dnorm(x,m1-m2,pi*sqrt((s1^2+s2^2)/3))+
dlogis(x,m1-m2,sqrt(s1^2+s2^2))),col="blue",lwd=2)


Edit (29/11/2013):
Updated the first image due to an accidental reversal of the distribution labels