Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I got some comments on my priors of last week’s post where better priors were proposed. Hence this week’s post looks at them. After some minor adaptations, since I have nine beta parameters and beta needs to be fairly high, I can conclude that these priors provide significant improvements. Especially the model which has p(alpha,beta) prop to (alpha + beta)(-5/2) worked extremely well.

### Priors

There were two proposals. A common step was ‘parameterizing in terms of mean theta = (alpha / (alpha + beta)) and total count kappa = (alpha + beta)’. In my own data there are nine regions hence nine sets of theta, kappa, alpha, beta. As hyperprior over the nine regions’ incidence I chose a normal distribution for theta.  The difference between the proposals is in kappa. I made no hyperpriors over the nine values of kappa.

#### Prior: Uniform on log scale

‘Andrew (…) suggests taking the priors on alpha and beta to be uniform on the log scale.’ In code this means log kappa is uniform distributed.

#### Prior: p(alpha,beta) prop to (alpha + beta)(-5/2)

Apparently in Bayesian Data Analysis 3 it is proposed to make p(alpha,beta) prop to (alpha + beta)(-5/2). This is handled via a Pareto distribution.

#### Sampling kappa priors

Via a small program I had a look what this means in terms of samples. The uniform log scale seemed to be reasonable even for the range of my beta parameters. On the other hand the p(alpha,beta) prop to (alpha + beta)(-5/2) proposal kept kappa way too low for the data at hand. This seems informative for a different range of kappa and hence beta rather than uninformative. There I made a different variation.
modelk <- '
parameters {
real kappa_log1;
real kappa2;
real kappa3;
}
transformed parameters {
real kappa1;
kappa1 <- exp(kappa_log1);
}
model {
kappa_log1 ~ uniform(0,18);
kappa2 ~ pareto(0.1,1.5);
kappa3 ~ pareto(10^6,1.5);
}

parametersk <- c('kappa1','kappa2','kappa3')
initsk <- function() {
list(
kappa_log1=runif(1,0,18)
, kappa2 = runif(1,.2,10)
, kappa3 = runif(1,2e6,1e7)
)
}

kfits <- stan(model_code = modelk,
pars=parametersk,
inits=initsk)
kfits
Inference for Stan model: modelk.
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.

mean   se_mean         sd       2.5%        25%        50%
kappa1 1994621.64 550416.10 7754182.44       1.57      82.85    4772.51
kappa2       0.24      0.02       0.27       0.10       0.12       0.16
kappa3 2602295.43 296845.16 4314974.09 1021790.77 1209898.43 1558773.41
lp__       -18.81      0.16       1.65     -23.11     -19.64     -18.41
75%       97.5% n_eff Rhat
kappa1  160223.83 25951765.02   198 1.03
kappa2       0.25        0.89   123 1.03
kappa3 2490623.45 11991112.38   211 1.02
lp__       -17.59      -16.83   112 1.03

Samples were drawn using NUTS(diag_e) at Sat Aug 30 14:12:51 2014.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).

### Effect in the whole model

#### Prior: Uniform on log scale

This works reasonable well. With significantly less samples than last week there are reasonable estimates. The only thing worrying is that incidence (it was named betamean in the previous models, but that was actually a bit of a misnomer) is a bit too high.
model1 <- '
data {
int n;
int nregion;
int count[n];
int Region[n];
int Population[n];
real scale;
}
parameters {
vector [nregion]  theta;
vector [nregion]  kappa_log;
vector [n] p1;
real thetam;
real thetasd;
}
transformed parameters {
vector [nregion] kappa;
vector [nregion] a;
vector [nregion] b;

kappa <- exp(kappa_log);
for (i in 1:nregion) {
a[i] <- kappa[i] * theta[i];
b[i] <- kappa[i] * (1 - theta[i]);
}
}
model {
kappa_log ~ uniform(12,18);
for (i in 1:n) {
p1[i] ~ beta(a[Region[i]],b[Region[i]]);
}
count ~ binomial(Population,p1);
theta ~ normal(thetam,thetasd);
thetam ~ uniform(0,1);
thetasd ~ uniform(0,.1);
}
generated quantities {
vector [n] pp;
vector [nregion] incidence;
real meaninc;
incidence <- scale*theta;
pp <- p1 * scale;
meaninc <- scale*thetam;
}

inits1 <- function()
list(theta=rnorm(datain\$nregion,1e-6,1e-7),
thetam = rnorm(1,1e-6,1e-7),
thetasd = rnorm(1,1e-6,1e-7),
kappa_log=runif(datain\$nregion,15,18),
p1=rnorm(datain\$n,1e-7,1e-8))

parameters1 <- c('a','b','incidence','kappa','kappa_log','meaninc') fits1 <- stan(model_code = model1,
data = datain,
pars=parameters1    ,
init=inits1)
print(fits1,probs=NA)
Inference for Stan model: model1.
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.

mean    se_mean          sd    n_eff Rhat
a                7.95       1.32       13.67 NA   107 1.04
a                9.16       1.43       16.76 NA   138 1.02
a               50.52       2.61       32.13 NA   151 1.03
a               35.68       2.25       32.61 NA   210 1.01
a                9.10       1.32       13.99 NA   113 1.04
a               11.99       1.48       17.40 NA   139 1.01
a               13.64       1.83       20.56 NA   126 1.01
a               14.19       2.56       20.35 NA    63 1.05
a               12.14       2.51       21.39 NA    73 1.05
b          5956080.76 1213799.11 10903417.54 NA    81 1.04
b          5293475.42  844235.36  9984631.29 NA   140 1.01
b         28376424.43 1399820.31 17900739.23 NA   164 1.03
b         19757543.83 1195288.60 17665183.92 NA   218 1.02
b          5493903.33  897750.46  9136674.30 NA   104 1.05
b          6449567.62  787646.39  9190268.32 NA   136 1.01
b          8500594.37 1172609.55 12531293.24 NA   114 1.01
b          8901106.72 1698740.11 12644831.14 NA    55 1.06
b          7267575.15 1406258.66 12121859.82 NA    74 1.05
incidence        0.10       0.00        0.02 NA    84 1.07
incidence        0.11       0.00        0.02 NA   177 1.02
incidence        0.11       0.00        0.01 NA   153 1.01
incidence        0.11       0.00        0.01 NA   178 1.02
incidence        0.11       0.00        0.02 NA   216 1.01
incidence        0.12       0.00        0.02 NA   121 1.03
incidence        0.10       0.00        0.02 NA   161 1.02
incidence        0.10       0.00        0.02 NA    81 1.06
incidence        0.10       0.00        0.02 NA   119 1.04
kappa      5956088.71 1213800.44 10903430.55 NA    81 1.04
kappa      5293484.58  844236.78  9984647.90 NA   140 1.01
kappa     28376474.95 1399822.89 17900770.95 NA   164 1.03
kappa     19757579.51 1195290.81 17665216.05 NA   218 1.02
kappa      5493912.43  897751.76  9136688.06 NA   104 1.05
kappa      6449579.61  787647.86  9190285.60 NA   136 1.01
kappa      8500608.01 1172611.40 12531313.60 NA   114 1.01
kappa      8901120.91 1698742.67 12644851.22 NA    55 1.06
kappa      7267587.30 1406261.16 12121881.04 NA    74 1.05
kappa_log       14.56       0.18        1.40 NA    60 1.08
kappa_log       14.37       0.13        1.42 NA   122 1.01
kappa_log       16.88       0.07        0.87 NA   167 1.03
kappa_log       16.28       0.08        1.15 NA   210 1.01
kappa_log       14.76       0.11        1.17 NA   119 1.03
kappa_log       15.10       0.08        1.03 NA   153 1.01
kappa_log       15.08       0.17        1.36 NA    61 1.03
kappa_log       15.12       0.22        1.38 NA    41 1.08
kappa_log       14.76       0.17        1.45 NA    70 1.04
meaninc             0.11       0.00        0.01 NA   107 1.02
lp__            -7659.18       1.14       11.63 NA   105 1.01

Samples were drawn using NUTS(diag_e) at Sat Aug 30 14:24:52 2014.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).

#### Prior: p(alpha,beta) prop to (alpha + beta)(-5/2)

This model did quite well. I wish all my models worked as well.
model2 <- '
data {
int n;
int nregion;
int count[n];
int Region[n];
int Population[n];
real scale;
}
parameters {
vector [nregion]  theta;
vector [nregion]  kappa;
vector [n] p1;
real thetam;
real thetasd;
}
transformed parameters {
vector [nregion] a;
vector [nregion] b;
for (i in 1:nregion) {
a[i] <- kappa[i] * theta[i];
b[i] <- kappa[i] * (1 - theta[i]);
}

}
model {
kappa ~ pareto(10^6,1.5);
for (i in 1:n) {
p1[i] ~ beta(a[Region[i]],b[Region[i]]);
}
count ~ binomial(Population,p1);
theta ~ normal(thetam,thetasd);
thetam ~ uniform(0,1);
thetasd ~ uniform(0,.1);
}
generated quantities {
vector [n] pp;
vector [nregion] incidence;
real meaninc;
incidence <- scale*theta;
pp <- p1 * scale;
meaninc <- scale*thetam;
}

inits2 <- function()
list(theta=rnorm(datain\$nregion,1e-6,1e-7),
thetam = rnorm(1,1e-6,1e-7),
thetasd = rnorm(1,1e-6,1e-7),
kappa=runif(datain\$nregion,1e6,1e7),
p1=rnorm(datain\$n,1e-7,1e-8))

parameters2 <- c('a','b','incidence','kappa','meaninc') fits2 <- stan(model_code = model2,
data = datain,
pars=parameters2,
init=inits2)

print(fits2,probs=NA)
Inference for Stan model: model2.
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.

mean   se_mean          sd    n_eff Rhat
a                3.03      0.06        2.54 NA  1851    1
a                3.36      0.05        2.74 NA  2746    1
a               22.28      1.39       39.52 NA   807    1
a                7.27      0.27       12.39 NA  2173    1
a                3.61      0.08        3.50 NA  2025    1
a                4.38      0.09        3.70 NA  1743    1
a                3.58      0.13        4.14 NA   957    1
a                3.54      0.11        3.60 NA  1125    1
a                3.20      0.10        3.81 NA  1346    1
b          2132903.53  47586.20  2085059.70 NA  1920    1
b          1804395.74  29404.95  1540741.77 NA  2745    1
b         12482567.37 782922.81 22226255.17 NA   806    1
b          4057579.18 146369.50  6818624.58 NA  2170    1
b          2080275.79  44577.27  2020891.35 NA  2055    1
b          2368569.05  48762.30  1922950.01 NA  1555    1
b          2281906.54  81326.60  2583506.11 NA  1009    1
b          2319737.52  71109.80  2408437.85 NA  1147    1
b          2092476.48  61180.23  2313166.69 NA  1430    1
incidence        0.09      0.00        0.02 NA   746    1
incidence        0.12      0.00        0.02 NA  1173    1
incidence        0.11      0.00        0.02 NA  1209    1
incidence        0.11      0.00        0.02 NA  1387    1
incidence        0.11      0.00        0.02 NA  1406    1
incidence        0.12      0.00        0.02 NA  1248    1
incidence        0.10      0.00        0.02 NA   999    1
incidence        0.10      0.00        0.02 NA  1063    1
incidence        0.10      0.00        0.02 NA   935    1
kappa      2132906.56  47586.25  2085062.04 NA  1920    1
kappa      1804399.10  29405.01  1540744.41 NA  2745    1
kappa     12482589.65 782924.19 22226294.41 NA   806    1
kappa      4057586.45 146369.76  6818636.82 NA  2170    1
kappa      2080279.40  44577.35  2020894.78 NA  2055    1
kappa      2368573.43  48762.38  1922953.62 NA  1555    1
kappa      2281910.12  81326.73  2583510.17 NA  1009    1
kappa      2319741.06  71109.91  2408441.36 NA  1147    1
kappa      2092479.69  61180.33  2313170.41 NA  1430    1
meaninc             0.11      0.00        0.01 NA   901    1
lp__            -7881.32      0.42        8.66 NA   428    1

Samples were drawn using NUTS(diag_e) at Sat Aug 30 14:34:49 2014.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).