Beta binomial revisited

August 31, 2014
By

(This article was first published on Wiekvoet, and kindly contributed to R-bloggers)

I got some comments on my priors of last week's post where better priors were proposed. Hence this week's post looks at them. After some minor adaptations, since I have nine beta parameters and beta needs to be fairly high, I can conclude that these priors provide significant improvements. Especially the model which has p(alpha,beta) prop to (alpha + beta)(-5/2) worked extremely well.

Priors

There were two proposals. A common step was 'parameterizing in terms of mean theta = (alpha / (alpha + beta)) and total count kappa = (alpha + beta)'. In my own data there are nine regions hence nine sets of theta, kappa, alpha, beta. As hyperprior over the nine regions' incidence I chose a normal distribution for theta.  The difference between the proposals is in kappa. I made no hyperpriors over the nine values of kappa.

Prior: Uniform on log scale

'Andrew (...) suggests taking the priors on alpha and beta to be uniform on the log scale.' In code this means log kappa is uniform distributed.

Prior: p(alpha,beta) prop to (alpha + beta)(-5/2)

Apparently in Bayesian Data Analysis 3 it is proposed to make p(alpha,beta) prop to (alpha + beta)(-5/2). This is handled via a Pareto distribution.

Sampling kappa priors

Via a small program I had a look what this means in terms of samples. The uniform log scale seemed to be reasonable even for the range of my beta parameters. On the other hand the p(alpha,beta) prop to (alpha + beta)(-5/2) proposal kept kappa way too low for the data at hand. This seems informative for a different range of kappa and hence beta rather than uninformative. There I made a different variation.
modelk <- '
parameters {
    real kappa_log1;
    real kappa2;
    real <lower=10^6> kappa3;
}
transformed parameters {
    real kappa1;
    kappa1 <- exp(kappa_log1);
}
model {      
    kappa_log1 ~ uniform(0,18);
    kappa2 ~ pareto(0.1,1.5);
    kappa3 ~ pareto(10^6,1.5);
}
'
parametersk <- c('kappa1','kappa2','kappa3')
initsk <- function() {
    list(
        kappa_log1=runif(1,0,18)
        , kappa2 = runif(1,.2,10)
        , kappa3 = runif(1,2e6,1e7)
    )
}

kfits <- stan(model_code = modelk,
        pars=parametersk,
        inits=initsk)
kfits

Inference for Stan model: modelk.
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.

             mean   se_mean         sd       2.5%        25%        50%
kappa1 1994621.64 550416.10 7754182.44       1.57      82.85    4772.51
kappa2       0.24      0.02       0.27       0.10       0.12       0.16
kappa3 2602295.43 296845.16 4314974.09 1021790.77 1209898.43 1558773.41
lp__       -18.81      0.16       1.65     -23.11     -19.64     -18.41
              75%       97.5% n_eff Rhat
kappa1  160223.83 25951765.02   198 1.03
kappa2       0.25        0.89   123 1.03
kappa3 2490623.45 11991112.38   211 1.02
lp__       -17.59      -16.83   112 1.03

Samples were drawn using NUTS(diag_e) at Sat Aug 30 14:12:51 2014.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).

Effect in the whole model

Prior: Uniform on log scale 

This works reasonable well. With significantly less samples than last week there are reasonable estimates. The only thing worrying is that incidence[1] (it was named betamean[1] in the previous models, but that was actually a bit of a misnomer) is a bit too high.
 model1 <- '
    data {
       int<lower=0> n;
       int<lower=0> nregion;
       int count[n];
       int<lower=1,upper=nregion> Region[n];
       int Population[n];
       real scale;
        }
    parameters {
       vector <lower=0,upper=1> [nregion]  theta;
       vector [nregion]  kappa_log;
       vector <lower=0,upper=1> [n] p1;
       real <lower=0,upper=1> thetam;
       real <lower=0> thetasd;
        }
  transformed parameters {
       vector [nregion] kappa;
       vector [nregion] a;
       vector [nregion] b;

       kappa <- exp(kappa_log);
       for (i in 1:nregion) {
          a[i] <- kappa[i] * theta[i];
          b[i] <- kappa[i] * (1 - theta[i]);
       }
  }
  model {     
        kappa_log ~ uniform(12,18);
        for (i in 1:n) {
            p1[i] ~ beta(a[Region[i]],b[Region[i]]);
        }
        count ~ binomial(Population,p1);
        theta ~ normal(thetam,thetasd);
        thetam ~ uniform(0,1);
        thetasd ~ uniform(0,.1);      
        }
  generated quantities {
        vector [n] pp;
        vector [nregion] incidence;
        real meaninc;
        incidence <- scale*theta;
        pp <- p1 * scale;
        meaninc <- scale*thetam;
        }
'

inits1 <- function()
    list(theta=rnorm(datain$nregion,1e-6,1e-7),
         thetam = rnorm(1,1e-6,1e-7),
         thetasd = rnorm(1,1e-6,1e-7),
         kappa_log=runif(datain$nregion,15,18),
         p1=rnorm(datain$n,1e-7,1e-8))

parameters1 <- c('a','b','incidence','kappa','kappa_log','meaninc')

fits1 <- stan(model_code = model1,
    data = datain,
    pars=parameters1    ,
    init=inits1)
print(fits1,probs=NA)

Inference for Stan model: model1. 
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.

                    mean    se_mean          sd    n_eff Rhat
a[1]                7.95       1.32       13.67 NA   107 1.04
a[2]                9.16       1.43       16.76 NA   138 1.02
a[3]               50.52       2.61       32.13 NA   151 1.03
a[4]               35.68       2.25       32.61 NA   210 1.01
a[5]                9.10       1.32       13.99 NA   113 1.04
a[6]               11.99       1.48       17.40 NA   139 1.01
a[7]               13.64       1.83       20.56 NA   126 1.01
a[8]               14.19       2.56       20.35 NA    63 1.05
a[9]               12.14       2.51       21.39 NA    73 1.05
b[1]          5956080.76 1213799.11 10903417.54 NA    81 1.04
b[2]          5293475.42  844235.36  9984631.29 NA   140 1.01
b[3]         28376424.43 1399820.31 17900739.23 NA   164 1.03
b[4]         19757543.83 1195288.60 17665183.92 NA   218 1.02
b[5]          5493903.33  897750.46  9136674.30 NA   104 1.05
b[6]          6449567.62  787646.39  9190268.32 NA   136 1.01
b[7]          8500594.37 1172609.55 12531293.24 NA   114 1.01
b[8]          8901106.72 1698740.11 12644831.14 NA    55 1.06
b[9]          7267575.15 1406258.66 12121859.82 NA    74 1.05
incidence[1]        0.10       0.00        0.02 NA    84 1.07
incidence[2]        0.11       0.00        0.02 NA   177 1.02
incidence[3]        0.11       0.00        0.01 NA   153 1.01
incidence[4]        0.11       0.00        0.01 NA   178 1.02
incidence[5]        0.11       0.00        0.02 NA   216 1.01
incidence[6]        0.12       0.00        0.02 NA   121 1.03
incidence[7]        0.10       0.00        0.02 NA   161 1.02
incidence[8]        0.10       0.00        0.02 NA    81 1.06
incidence[9]        0.10       0.00        0.02 NA   119 1.04
kappa[1]      5956088.71 1213800.44 10903430.55 NA    81 1.04
kappa[2]      5293484.58  844236.78  9984647.90 NA   140 1.01
kappa[3]     28376474.95 1399822.89 17900770.95 NA   164 1.03
kappa[4]     19757579.51 1195290.81 17665216.05 NA   218 1.02
kappa[5]      5493912.43  897751.76  9136688.06 NA   104 1.05
kappa[6]      6449579.61  787647.86  9190285.60 NA   136 1.01
kappa[7]      8500608.01 1172611.40 12531313.60 NA   114 1.01
kappa[8]      8901120.91 1698742.67 12644851.22 NA    55 1.06
kappa[9]      7267587.30 1406261.16 12121881.04 NA    74 1.05
kappa_log[1]       14.56       0.18        1.40 NA    60 1.08
kappa_log[2]       14.37       0.13        1.42 NA   122 1.01
kappa_log[3]       16.88       0.07        0.87 NA   167 1.03
kappa_log[4]       16.28       0.08        1.15 NA   210 1.01
kappa_log[5]       14.76       0.11        1.17 NA   119 1.03
kappa_log[6]       15.10       0.08        1.03 NA   153 1.01
kappa_log[7]       15.08       0.17        1.36 NA    61 1.03
kappa_log[8]       15.12       0.22        1.38 NA    41 1.08
kappa_log[9]       14.76       0.17        1.45 NA    70 1.04
meaninc             0.11       0.00        0.01 NA   107 1.02
lp__            -7659.18       1.14       11.63 NA   105 1.01

Samples were drawn using NUTS(diag_e) at Sat Aug 30 14:24:52 2014.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).

Prior: p(alpha,beta) prop to (alpha + beta)(-5/2)


This model did quite well. I wish all my models worked as well.
model2 <- '
data {
    int<lower=0> n;
    int<lower=0> nregion;
    int count[n];
    int<lower=1,upper=nregion> Region[n];
    int Population[n];
    real scale;
}
parameters {
    vector <lower=0,upper=1> [nregion]  theta;
    vector <lower=1e6> [nregion]  kappa; 
    vector <lower=0,upper=1> [n] p1;
    real <lower=0,upper=1> thetam;
    real <lower=0> thetasd;
}
transformed parameters {
    vector [nregion] a;
    vector [nregion] b;
    for (i in 1:nregion) {
        a[i] <- kappa[i] * theta[i];
        b[i] <- kappa[i] * (1 - theta[i]);
    }
   
}
model {      
    kappa ~ pareto(10^6,1.5);
    for (i in 1:n) {
        p1[i] ~ beta(a[Region[i]],b[Region[i]]);
    }
    count ~ binomial(Population,p1);
    theta ~ normal(thetam,thetasd);
    thetam ~ uniform(0,1);
    thetasd ~ uniform(0,.1);       
}
generated quantities {
    vector [n] pp;
    vector [nregion] incidence;
    real meaninc;
    incidence <- scale*theta;
    pp <- p1 * scale;
    meaninc <- scale*thetam;
}
'

inits2 <- function()
    list(theta=rnorm(datain$nregion,1e-6,1e-7),
        thetam = rnorm(1,1e-6,1e-7),
        thetasd = rnorm(1,1e-6,1e-7),
        kappa=runif(datain$nregion,1e6,1e7),
        p1=rnorm(datain$n,1e-7,1e-8))

parameters2 <- c('a','b','incidence','kappa','meaninc')

fits2 <- stan(model_code = model2,
    data = datain,
    pars=parameters2,
    init=inits2)

print(fits2,probs=NA)

Inference for Stan model: model2.
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.

                    mean   se_mean          sd    n_eff Rhat
a[1]                3.03      0.06        2.54 NA  1851    1
a[2]                3.36      0.05        2.74 NA  2746    1
a[3]               22.28      1.39       39.52 NA   807    1
a[4]                7.27      0.27       12.39 NA  2173    1
a[5]                3.61      0.08        3.50 NA  2025    1
a[6]                4.38      0.09        3.70 NA  1743    1
a[7]                3.58      0.13        4.14 NA   957    1
a[8]                3.54      0.11        3.60 NA  1125    1
a[9]                3.20      0.10        3.81 NA  1346    1
b[1]          2132903.53  47586.20  2085059.70 NA  1920    1
b[2]          1804395.74  29404.95  1540741.77 NA  2745    1
b[3]         12482567.37 782922.81 22226255.17 NA   806    1
b[4]          4057579.18 146369.50  6818624.58 NA  2170    1
b[5]          2080275.79  44577.27  2020891.35 NA  2055    1
b[6]          2368569.05  48762.30  1922950.01 NA  1555    1
b[7]          2281906.54  81326.60  2583506.11 NA  1009    1
b[8]          2319737.52  71109.80  2408437.85 NA  1147    1
b[9]          2092476.48  61180.23  2313166.69 NA  1430    1
incidence[1]        0.09      0.00        0.02 NA   746    1
incidence[2]        0.12      0.00        0.02 NA  1173    1
incidence[3]        0.11      0.00        0.02 NA  1209    1
incidence[4]        0.11      0.00        0.02 NA  1387    1
incidence[5]        0.11      0.00        0.02 NA  1406    1
incidence[6]        0.12      0.00        0.02 NA  1248    1
incidence[7]        0.10      0.00        0.02 NA   999    1
incidence[8]        0.10      0.00        0.02 NA  1063    1
incidence[9]        0.10      0.00        0.02 NA   935    1
kappa[1]      2132906.56  47586.25  2085062.04 NA  1920    1
kappa[2]      1804399.10  29405.01  1540744.41 NA  2745    1
kappa[3]     12482589.65 782924.19 22226294.41 NA   806    1
kappa[4]      4057586.45 146369.76  6818636.82 NA  2170    1
kappa[5]      2080279.40  44577.35  2020894.78 NA  2055    1
kappa[6]      2368573.43  48762.38  1922953.62 NA  1555    1
kappa[7]      2281910.12  81326.73  2583510.17 NA  1009    1
kappa[8]      2319741.06  71109.91  2408441.36 NA  1147    1
kappa[9]      2092479.69  61180.33  2313170.41 NA  1430    1
meaninc             0.11      0.00        0.01 NA   901    1
lp__            -7881.32      0.42        8.66 NA   428    1

Samples were drawn using NUTS(diag_e) at Sat Aug 30 14:34:49 2014.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).


To leave a comment for the author, please follow the link and comment on his blog: Wiekvoet.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.