[This article was first published on Wiekvoet, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I got some comments on my priors of last week’s post where better priors were proposed. Hence this week’s post looks at them. After some minor adaptations, since I have nine beta parameters and beta needs to be fairly high, I can conclude that these priors provide significant improvements. Especially the model which has p(alpha,beta) prop to (alpha + beta)(-5/2) worked extremely well.

### Priors

There were two proposals. A common step was ‘parameterizing in terms of mean theta = (alpha / (alpha + beta)) and total count kappa = (alpha + beta)’. In my own data there are nine regions hence nine sets of theta, kappa, alpha, beta. As hyperprior over the nine regions’ incidence I chose a normal distribution for theta.  The difference between the proposals is in kappa. I made no hyperpriors over the nine values of kappa.

#### Prior: Uniform on log scale

‘Andrew (…) suggests taking the priors on alpha and beta to be uniform on the log scale.’ In code this means log kappa is uniform distributed.

#### Prior: p(alpha,beta) prop to (alpha + beta)(-5/2)

Apparently in Bayesian Data Analysis 3 it is proposed to make p(alpha,beta) prop to (alpha + beta)(-5/2). This is handled via a Pareto distribution.

#### Sampling kappa priors

Via a small program I had a look what this means in terms of samples. The uniform log scale seemed to be reasonable even for the range of my beta parameters. On the other hand the p(alpha,beta) prop to (alpha + beta)(-5/2) proposal kept kappa way too low for the data at hand. This seems informative for a different range of kappa and hence beta rather than uninformative. There I made a different variation.
modelk <- '
parameters {
real kappa_log1;
real kappa2;
real kappa3;
}
transformed parameters {
real kappa1;
kappa1 <- exp(kappa_log1);
}
model {
kappa_log1 ~ uniform(0,18);
kappa2 ~ pareto(0.1,1.5);
kappa3 ~ pareto(10^6,1.5);
}

parametersk <- c('kappa1','kappa2','kappa3')
initsk <- function() {
list(
kappa_log1=runif(1,0,18)
, kappa2 = runif(1,.2,10)
, kappa3 = runif(1,2e6,1e7)
)
}

kfits <- stan(model_code = modelk,
pars=parametersk,
inits=initsk)
kfits
Inference for Stan model: modelk.
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.

mean   se_mean         sd       2.5%        25%        50%
kappa1 1994621.64 550416.10 7754182.44       1.57      82.85    4772.51
kappa2       0.24      0.02       0.27       0.10       0.12       0.16
kappa3 2602295.43 296845.16 4314974.09 1021790.77 1209898.43 1558773.41
lp__       -18.81      0.16       1.65     -23.11     -19.64     -18.41
75%       97.5% n_eff Rhat
kappa1  160223.83 25951765.02   198 1.03
kappa2       0.25        0.89   123 1.03
kappa3 2490623.45 11991112.38   211 1.02
lp__       -17.59      -16.83   112 1.03

Samples were drawn using NUTS(diag_e) at Sat Aug 30 14:12:51 2014.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).

### Effect in the whole model

#### Prior: Uniform on log scale

This works reasonable well. With significantly less samples than last week there are reasonable estimates. The only thing worrying is that incidence[1] (it was named betamean[1] in the previous models, but that was actually a bit of a misnomer) is a bit too high.
model1 <- '
data {
int n;
int nregion;
int count[n];
int Region[n];
int Population[n];
real scale;
}
parameters {
vector [nregion]  theta;
vector [nregion]  kappa_log;
vector [n] p1;
real thetam;
real thetasd;
}
transformed parameters {
vector [nregion] kappa;
vector [nregion] a;
vector [nregion] b;

kappa <- exp(kappa_log);
for (i in 1:nregion) {
a[i] <- kappa[i] * theta[i];
b[i] <- kappa[i] * (1 - theta[i]);
}
}
model {
kappa_log ~ uniform(12,18);
for (i in 1:n) {
p1[i] ~ beta(a[Region[i]],b[Region[i]]);
}
count ~ binomial(Population,p1);
theta ~ normal(thetam,thetasd);
thetam ~ uniform(0,1);
thetasd ~ uniform(0,.1);
}
generated quantities {
vector [n] pp;
vector [nregion] incidence;
real meaninc;
incidence <- scale*theta;
pp <- p1 * scale;
meaninc <- scale*thetam;
}

inits1 <- function()
list(theta=rnorm(datain\$nregion,1e-6,1e-7),
thetam = rnorm(1,1e-6,1e-7),
thetasd = rnorm(1,1e-6,1e-7),
kappa_log=runif(datain\$nregion,15,18),
p1=rnorm(datain\$n,1e-7,1e-8))

parameters1 <- c('a','b','incidence','kappa','kappa_log','meaninc') fits1 <- stan(model_code = model1,
data = datain,
pars=parameters1    ,
init=inits1)
print(fits1,probs=NA)
Inference for Stan model: model1.
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.

mean    se_mean          sd    n_eff Rhat
a[1]                7.95       1.32       13.67 NA   107 1.04
a[2]                9.16       1.43       16.76 NA   138 1.02
a[3]               50.52       2.61       32.13 NA   151 1.03
a[4]               35.68       2.25       32.61 NA   210 1.01
a[5]                9.10       1.32       13.99 NA   113 1.04
a[6]               11.99       1.48       17.40 NA   139 1.01
a[7]               13.64       1.83       20.56 NA   126 1.01
a[8]               14.19       2.56       20.35 NA    63 1.05
a[9]               12.14       2.51       21.39 NA    73 1.05
b[1]          5956080.76 1213799.11 10903417.54 NA    81 1.04
b[2]          5293475.42  844235.36  9984631.29 NA   140 1.01
b[3]         28376424.43 1399820.31 17900739.23 NA   164 1.03
b[4]         19757543.83 1195288.60 17665183.92 NA   218 1.02
b[5]          5493903.33  897750.46  9136674.30 NA   104 1.05
b[6]          6449567.62  787646.39  9190268.32 NA   136 1.01
b[7]          8500594.37 1172609.55 12531293.24 NA   114 1.01
b[8]          8901106.72 1698740.11 12644831.14 NA    55 1.06
b[9]          7267575.15 1406258.66 12121859.82 NA    74 1.05
incidence[1]        0.10       0.00        0.02 NA    84 1.07
incidence[2]        0.11       0.00        0.02 NA   177 1.02
incidence[3]        0.11       0.00        0.01 NA   153 1.01
incidence[4]        0.11       0.00        0.01 NA   178 1.02
incidence[5]        0.11       0.00        0.02 NA   216 1.01
incidence[6]        0.12       0.00        0.02 NA   121 1.03
incidence[7]        0.10       0.00        0.02 NA   161 1.02
incidence[8]        0.10       0.00        0.02 NA    81 1.06
incidence[9]        0.10       0.00        0.02 NA   119 1.04
kappa[1]      5956088.71 1213800.44 10903430.55 NA    81 1.04
kappa[2]      5293484.58  844236.78  9984647.90 NA   140 1.01
kappa[3]     28376474.95 1399822.89 17900770.95 NA   164 1.03
kappa[4]     19757579.51 1195290.81 17665216.05 NA   218 1.02
kappa[5]      5493912.43  897751.76  9136688.06 NA   104 1.05
kappa[6]      6449579.61  787647.86  9190285.60 NA   136 1.01
kappa[7]      8500608.01 1172611.40 12531313.60 NA   114 1.01
kappa[8]      8901120.91 1698742.67 12644851.22 NA    55 1.06
kappa[9]      7267587.30 1406261.16 12121881.04 NA    74 1.05
kappa_log[1]       14.56       0.18        1.40 NA    60 1.08
kappa_log[2]       14.37       0.13        1.42 NA   122 1.01
kappa_log[3]       16.88       0.07        0.87 NA   167 1.03
kappa_log[4]       16.28       0.08        1.15 NA   210 1.01
kappa_log[5]       14.76       0.11        1.17 NA   119 1.03
kappa_log[6]       15.10       0.08        1.03 NA   153 1.01
kappa_log[7]       15.08       0.17        1.36 NA    61 1.03
kappa_log[8]       15.12       0.22        1.38 NA    41 1.08
kappa_log[9]       14.76       0.17        1.45 NA    70 1.04
meaninc             0.11       0.00        0.01 NA   107 1.02
lp__            -7659.18       1.14       11.63 NA   105 1.01

Samples were drawn using NUTS(diag_e) at Sat Aug 30 14:24:52 2014.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).

#### Prior: p(alpha,beta) prop to (alpha + beta)(-5/2)

This model did quite well. I wish all my models worked as well.
model2 <- '
data {
int n;
int nregion;
int count[n];
int Region[n];
int Population[n];
real scale;
}
parameters {
vector [nregion]  theta;
vector [nregion]  kappa;
vector [n] p1;
real thetam;
real thetasd;
}
transformed parameters {
vector [nregion] a;
vector [nregion] b;
for (i in 1:nregion) {
a[i] <- kappa[i] * theta[i];
b[i] <- kappa[i] * (1 - theta[i]);
}

}
model {
kappa ~ pareto(10^6,1.5);
for (i in 1:n) {
p1[i] ~ beta(a[Region[i]],b[Region[i]]);
}
count ~ binomial(Population,p1);
theta ~ normal(thetam,thetasd);
thetam ~ uniform(0,1);
thetasd ~ uniform(0,.1);
}
generated quantities {
vector [n] pp;
vector [nregion] incidence;
real meaninc;
incidence <- scale*theta;
pp <- p1 * scale;
meaninc <- scale*thetam;
}

inits2 <- function()
list(theta=rnorm(datain\$nregion,1e-6,1e-7),
thetam = rnorm(1,1e-6,1e-7),
thetasd = rnorm(1,1e-6,1e-7),
kappa=runif(datain\$nregion,1e6,1e7),
p1=rnorm(datain\$n,1e-7,1e-8))

parameters2 <- c('a','b','incidence','kappa','meaninc') fits2 <- stan(model_code = model2,
data = datain,
pars=parameters2,
init=inits2)

print(fits2,probs=NA)
Inference for Stan model: model2.
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.

mean   se_mean          sd    n_eff Rhat
a[1]                3.03      0.06        2.54 NA  1851    1
a[2]                3.36      0.05        2.74 NA  2746    1
a[3]               22.28      1.39       39.52 NA   807    1
a[4]                7.27      0.27       12.39 NA  2173    1
a[5]                3.61      0.08        3.50 NA  2025    1
a[6]                4.38      0.09        3.70 NA  1743    1
a[7]                3.58      0.13        4.14 NA   957    1
a[8]                3.54      0.11        3.60 NA  1125    1
a[9]                3.20      0.10        3.81 NA  1346    1
b[1]          2132903.53  47586.20  2085059.70 NA  1920    1
b[2]          1804395.74  29404.95  1540741.77 NA  2745    1
b[3]         12482567.37 782922.81 22226255.17 NA   806    1
b[4]          4057579.18 146369.50  6818624.58 NA  2170    1
b[5]          2080275.79  44577.27  2020891.35 NA  2055    1
b[6]          2368569.05  48762.30  1922950.01 NA  1555    1
b[7]          2281906.54  81326.60  2583506.11 NA  1009    1
b[8]          2319737.52  71109.80  2408437.85 NA  1147    1
b[9]          2092476.48  61180.23  2313166.69 NA  1430    1
incidence[1]        0.09      0.00        0.02 NA   746    1
incidence[2]        0.12      0.00        0.02 NA  1173    1
incidence[3]        0.11      0.00        0.02 NA  1209    1
incidence[4]        0.11      0.00        0.02 NA  1387    1
incidence[5]        0.11      0.00        0.02 NA  1406    1
incidence[6]        0.12      0.00        0.02 NA  1248    1
incidence[7]        0.10      0.00        0.02 NA   999    1
incidence[8]        0.10      0.00        0.02 NA  1063    1
incidence[9]        0.10      0.00        0.02 NA   935    1
kappa[1]      2132906.56  47586.25  2085062.04 NA  1920    1
kappa[2]      1804399.10  29405.01  1540744.41 NA  2745    1
kappa[3]     12482589.65 782924.19 22226294.41 NA   806    1
kappa[4]      4057586.45 146369.76  6818636.82 NA  2170    1
kappa[5]      2080279.40  44577.35  2020894.78 NA  2055    1
kappa[6]      2368573.43  48762.38  1922953.62 NA  1555    1
kappa[7]      2281910.12  81326.73  2583510.17 NA  1009    1
kappa[8]      2319741.06  71109.91  2408441.36 NA  1147    1
kappa[9]      2092479.69  61180.33  2313170.41 NA  1430    1
meaninc             0.11      0.00        0.01 NA   901    1
lp__            -7881.32      0.42        8.66 NA   428    1

Samples were drawn using NUTS(diag_e) at Sat Aug 30 14:34:49 2014.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).

To leave a comment for the author, please follow the link and comment on their blog: Wiekvoet.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

# Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts.(You will not see this message again.)

Click here to close (This popup will not appear again)