The Beta distribution of the third kind (or generalised Beta prime)
Want to share your content on Rbloggers? click here if you have a blog, or here if you don't.
 Preliminaries: the (scaled) Beta prime distribution
 Beta distribution of the third kind
 Update 20190905: generalised Beta distribution
 Cumulative distribution function
 Sampling the Beta distribution of the third kind
 Application to the Bayesian binomial model
 Application to the Bayesian “two Poisson samples” model
We present the family of socalled Beta distributions of the third kind. In the context of Bayesian statistics, it is a conjugate family of prior distributions on the odds parameter of the binomial model. This distribution is known, but nobody provided a way to sample from it. We show how one can sample from this distribution in R.
Preliminaries: the (scaled) Beta prime distribution
The Beta distribution of the third kind generalizes the Beta distribution of the second kind, also known under the name Beta prime distribution.
The Beta prime distribution \(\mathcal{B}'(c,d,\lambda)\) is the distribution of the random variable \(\lambda\frac{U}{1U}\) where \(U \sim \mathcal{B}(c,d)\).
Its density function at \(x \geqslant 0\) is \[ \frac{1}{\lambda^c B(c,d)} \frac{x^{c1}}{\left(1+\frac{x}{\lambda}\right)^{c+d}}. \] Usually the definition does not include the scale parameter \(\lambda\) (that is, it is usually defined for \(\lambda=1\) only).
It is easy to implement a sampler for this distribution, the density function and the cumulative density function:
rbetaprime < function(n, c, d, lambda = 1){ stopifnot(lambda > 0) u < rbeta(n, c, d) lambda * u/(1u) } dbetaprime < function(x, c, d, lambda = 1){ stopifnot(lambda > 0) lambda/(lambda+x)^2 * dbeta(x/lambda/(1+x/lambda), c, d) } pbetaprime < function(q, c, d, lambda){ stopifnot(lambda > 0) pbeta(q/lambda/(1+q/lambda), c, d) }
Beta distribution of the third kind
The Beta distribution of the third kind \(\mathcal{B}_3\) was firstly introduced (as far as I know) in the paper Some Poisson mixtures with a hyperscale parameter, written by myself.
For parameters \(c>0\), \(d>0\), \(\kappa \in \mathbb{R}\), \(\tau>0\), the density function of \(\mathcal{B}_3(c,d,\kappa,\tau)\) is \[ f(\phi) = \frac{1} {B(c,d){}_2\!F_1\left(c, c+d\kappa, c+d; 1 – \frac{1}{\tau}\right)} \frac{\phi^{c1}(1+\phi)^{\kappa}} {\left(1+\frac{\phi}{\tau}\right)^{c+d\kappa}}, \quad \phi \geqslant 0. \] Thus, for \(\kappa=0\), the \(\mathcal{B}_3(c,d,\kappa,\tau)\) distribution equals \(\mathcal{B}'(c,d,\tau)\), and for \(\kappa = c+d\) or \(\tau=1\), it equals \(\mathcal{B}'(c,d,1)\). Note that in general, \(\tau\) is not a scale parameter.
Let’s write a R function computing this density:
library(gsl) Gauss2F1 < function(a,b,c,x){ if(x>=0 && x<1){ # hyperg_2F1 works fine in this situation hyperg_2F1(a, b, c, x) }else{ # transform to come down to the first situation hyperg_2F1(ca, b, c, 11/(1x)) / (1x)^b } } dB3 < function(x, c, d, kappa, tau){ stopifnot(c > 0, d > 0, tau > 0) if(kappa == 0){ dbetaprime(x, c, d, tau) }else if(kappa == c+d){ dbetaprime(x, c, d, 1) }else{ 1/beta(c,d) * x^(c1)*(1+x)^(kappa)/(1+x/tau)^(c+dkappa) / Gauss2F1(c, c+dkappa, c+d, 11/tau) } }
This distribution is related to the fourparameter generalized Beta distribution introduced by Chen & Novick in the paper Bayesian analysis for binomial models with generalized beta prior distributions (1984); this distribution takes its value in \([0,1]\). They are related by an elementary transformation: \[ \Theta \sim GB4(c, d, \kappa, \tau) \quad \iff\quad \frac{\Theta}{1\Theta} \sim \mathcal{B}_3\left(c, d, c+d\kappa, \frac{1}{\tau}\right). \]
Update 20190905: generalised Beta distribution
I’ve just discovered that the \(GB4\) distribution appears in the paper On Kummer’s distributions of type two and generalized Beta distributions written by Hamza & Vallois. It is named generalised Beta distribution in this paper, it is denoted by \(\beta_\delta(a,b,c)\) and its density function at \(u \in [0,1]\) is given by \[ \frac{1}{\beta(a,b){}_2\!F_1(c,a;a+b;1\delta)} u^{a1}(1u)^{b1}\bigl(1+(\delta1)u\bigr)^c \] for \(a,b,\delta>0\) and \(c \in \mathbb{R}\).
We have the following relation: if \(\Phi \sim \mathcal{B}_3(c, d, \kappa, \tau)\), then \[ \frac{\Phi}{1+\Phi} \sim \beta_{\frac{1}{\tau}}(c, d, \kappacd). \]
So, maybe a better name for \(\mathcal{B}_3\) would be generalised Beta prime distribution.
Cumulative distribution function
The cumulative distribution function of
\(\mathcal{B}_3\) involves the
Appell hypergeometric function
\(F_1\). A Fortran implementation of
this function is available in the R package appell
. This
package has been removed from CRAN, but you can still install it. If
\(\Phi \sim \mathcal{B}_3(c,d,\kappa,\tau)\), then, for \(q \geqslant 0\),
\[ \Pr(\Phi \leqslant q) = \frac{q^c F_1\left(c; \kappa, c+d\kappa;
c+1; q, \frac{q}{\tau}\right)} {cB(c,d){}_2\!F_1\left(c, c+d\kappa,
c+d; 1 – \frac{1}{\tau}\right)}. \]
Here is a R implementation. I found that it works well with the option
userflag = 0
of the appellf1
function.
pB3 < function(q, c, d, kappa, tau, userflag = 0){ stopifnot(c > 0, d > 0, tau > 0) if(kappa == 0){ pbetaprime(q, c, d, tau) }else if(kappa == c+d){ pbetaprime(q, c, d, 1) }else{ C < beta(c,d) * Gauss2F1(c, c+dkappa, c+d, 11/tau) Appell < appell::appellf1(c, kappa, c+dkappa, c+1, q, q/tau, userflag = userflag) q^c/c * Re(Appell$val) / C } }
Sampling the Beta distribution of the third kind
It is not very easy to sample the \(\mathcal{B}_3\) distribution. In her master thesis, Myriam Chabot proved that it can be represented as a discrete mixture of \(\mathcal{B}_2\) distributions, and we will use this result.
This result is the following one.

For \(\tau < 1\), let \(K\) be a random variable on \(\mathbb{N}\) whose probability mass at \(k\in\mathbb{N}\) is given by \[ \frac{1}{{}_2\!F_1(d, c+d\kappa, c+d, 1\tau)} \frac{(1\tau)^k}{k!} \frac{{(c+d\kappa)}_k{(d)}_k}{{(c+d)}_k} \] and let \(\Phi\) be a random variable such that \[ (\Phi \mid K=k) \sim \mathcal{B}'(c, d+k, 1). \] Then \(\Phi \sim \mathcal{B}_3(c,d,\kappa,\tau)\).

For \(\tau > 1\), let \(K\) be a random variable on \(\mathbb{N}\) whose probability mass at \(k\in\mathbb{N}\) is given by \[ \frac{1}{{}_2\!F_1\left(c, c+d\kappa, c+d, 1\frac{1}{\tau}\right)} \frac{\left(1\frac{1}{\tau}\right)^k}{k!} \frac{{(c+d\kappa)}_k{(c)}_k}{{(c+d)}_k} \] and let \(\Phi\) be a random variable such that \[ (\Phi \mid K=k) \sim \mathcal{B}'(c+k, d, 1). \] Then \(\Phi \sim \mathcal{B}_3(c,d,\kappa,\tau)\).
So we can sample
\(\mathcal{B}_3(c,d,\kappa,\tau)\) if
we are able to sample these discrete distributions. To do so, we use the
Runuran
package.
library(Runuran) pmf_unnormalized < function(k, c, d, kappa, tau){ out < numeric(length(k)) positive < k >= 0 k < k[positive] out[positive] < if(tau < 1){ exp(k*log(1tau)  lfactorial(k) + lnpoch(c+dkappa,k) + lnpoch(d,k)  lnpoch(c+d,k)) }else{ exp(k*log(11/tau)  lfactorial(k) + lnpoch(c+dkappa,k) + lnpoch(c,k)  lnpoch(c+d,k)) } out } NormalizingConstant < function(c, d, kappa, tau){ if(tau < 1){ hyperg_2F1(d, c+dkappa, c+d, 1tau) }else{ hyperg_2F1(c, c+dkappa, c+d, 11/tau) } } Ksampler < function(n, c, d, kappa, tau){ dist < unuran.discr.new( pmf = function(k) pmf_unnormalized(k, c, d, kappa, tau), lb = 0, ub= Inf, sum = NormalizingConstant(c, d, kappa, tau) ) unuran < unuran.new(dist, method="dgt") ur(unuran, n) } rB3 < function(n, c, d, kappa, tau){ stopifnot(c > 0, d > 0, tau > 0) if(tau == 1  kappa == c+d){ rbetaprime(n, c, d, 1) }else if(kappa == 0){ rbetaprime(n, c, d, tau) }else{ K < Ksampler(n, c, d, kappa, tau) if(tau < 1){ rbetaprime(n, c, d + K, 1) }else{ rbetaprime(n, c + K, d, 1) } } }
Let’s check. The density:
c < 2; d < 3; kappa < 4; tau < 5 nsims < 1000000 sims < rB3(nsims, c, d, kappa, tau) plot(density(sims, from = 0, to = quantile(sims, 0.95))) curve(dB3(x, c, d, kappa, tau), add = TRUE, col = "red", lty = "dashed", lwd = 2)
The cumulative distribution function:
q < seq(0.1, 4, length.out = 10)[6] cdfValues < sapply(q, function(x) pB3(x, c, d, kappa, tau)) empirical_cdf < ecdf(sims) plot(empirical_cdf, xlim = c(0,4)) points(q, cdfValues, pch=19)
I’ve removed the sixth value of the vector q
because there
is a crash of appellf1
for this value:
q < seq(0.1, 4, length.out = 10) pB3(q[6], c, d, kappa, tau) ## Error in appell::appellf1(c, kappa, c + d  kappa, c + 1, q, q/tau, : f1conv: Computation not possible
It works with the option userflag = 1
:
pB3(q[6], c, d, kappa, tau, userflag = 1) ## [1] 0.849321
Finally perhaps the option userflag = 1
is a better default
value:
cdfValues < sapply(q, function(x){ pB3(x, c, d, kappa, tau, userflag = 1) }) plot(empirical_cdf, xlim = c(0,4)) points(q, cdfValues, pch=19)
Application to the Bayesian binomial model
Consider the binomial statistical model parameterized by the odds ratio \(\phi\): \[ L(\phi \mid x) \propto \frac{\phi^x}{(1+\phi)^n}. \] Take a \(\mathcal{B}_3(c,d,\kappa,\tau)\) prior distribution on \(\phi\): \[ \pi(\phi) \propto \frac{\phi^{c1}(1+\phi)^{\kappa}} {\left(1+\frac{\phi}{\tau}\right)^{c+d\kappa}} \] Then the posterior distribution on \(\phi\) is \(\mathcal{B}_3(c+x,d+nx,\kappa+n,\tau)\).
Application to the Bayesian “two Poisson samples” model
Consider the statistical model given by two independent observations \[ x \sim \mathcal{P}(\lambda S), \quad y\sim\mathcal{P}(\mu T) \] where \(S\) and \(T\) are known “sample sizes”. We parametrize the model by \(\mu\) and \(\phi := \frac{\lambda}{\mu}\).
Assigning a Gamma prior distribution \(\mathcal{G}{B}(a,b)\) on \(\mu\), it is not difficult to get \[ (\mu \mid \phi,x,y) \sim\mathcal{G}(a+x+y, b + \phi S + T). \] In their paper A Bayesian framework for the ratio of two Poisson rates in the context of vaccine efficacy trials (2011), Laurent & Legrand derived the marginal likelihood on \(\phi\) in the case of the semiconjuguate family of prior distributions. Their result holds as long as \(\mu\) and \(\phi\) are independent under the prior distribution, and this result is \[ \widetilde{L}(\phi \mid x,y) \propto \frac{\phi^x}{{(1+\rho\phi)}^{a+x+y}} \] where \(\rho = \frac{S}{T+b}\).
Now, let’s assign a \(\mathcal{B}'(c,d,\tau)\) prior distribution on \(\phi\). Then the posterior distribution on \(\phi\) is given by \[ \pi(\phi \mid x,y) \propto \pi(\phi) \widetilde{L}(\phi \mid x,y) \propto \frac{\phi^{c+x1}(1+\rho\phi)^{(a+x+y)}} {\left(1+\frac{\phi}{\tau}\right)^{c+d}}, \] and by noting that \(\frac{\phi}{\tau} = \frac{\rho\phi}{\rho\tau}\), we recognize the scaled \(\mathcal{B}_3\) distribution \[ (\phi \mid x,y) \sim \frac{1}{\rho} \times \mathcal{B}_3(c+x, a+d+y, a+x+y, \rho\tau). \] In particular, if \(\tau = \frac{1}{\rho} = \frac{T+b}{S}\), then we find \[ (\phi \mid x,y) \sim \mathcal{B}_2(c+x, a+d+y, \tau), \] and this is the situation of the semiconjugate family studied by Laurent & Legrand.
Rbloggers.com offers daily email updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/datascience job.
Want to share your content on Rbloggers? click here if you have a blog, or here if you don't.