A Bayesian approach to modelling censored data

Posted on July 4, 2017 by Silent Spring Institute Developer Blog in R bloggers | 0 Comments

[This article was first published on Silent Spring Institute Developer Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

For thise case, we can write Bayes formula as:

$p(\mu,\sigma|y^{\ast}) = \frac{p(y^{\ast}|\mu,\sigma)p(\mu,\sigma)}{p(y^{\ast})}$

The two components in the numerator are:

The probability of the data given a $ \mu $ and $ \sigma $, also called the likelihood function: $ p(y|\mu,\sigma) $
The probability of a given $ \mu $ and $ \sigma $, before seeing any data; also called the prior likelihood: $ p(\mu, \sigma) $

The denominator, $ p(y^{\ast}) $, is the total likelihood of the data integrated over all hypotheses (that is, over all values of $ \mu $ and $ \sigma $). This is the part of the formula that typically doesn’t have an analytical solution; in practice, numerical approximations need to be used.

Our job is to define the likelihood function and provide a prior distribution for $ \mu $ and $ \sigma $.

In the previous post, we derived the likelihood function for left-censored data:

$p\left(y^{\ast}|\mu,\sigma\right)=\prod\begin{cases} P\left(y^{\ast}_{i} | \mu,\sigma\right) & y^{\ast}_{i}>L\\ D\left(L | \mu,\sigma\right) & y^{\ast}_{i}=L \end{cases} %]]>$

where $ P $ is the log-normal probability density function and $ D $ is the log-normal cumulative density function.

The specification of a prior distribution depends on the application, and what is known about what reasonable values may be for $ \mu $ and $ \sigma $. In this case, let’s set a very weakly informative prior on both variables: $ \mu \sim Normal(0, 100) $ and $ \sigma \sim Normal(0, 100) $

A STAN program that encodes our likelihood function and prior specifications:

This program is written to look like the math, so it doesn’t use some special syntax that makes the code shorter, and it isn’t as computationally efficient as it code be.

data {
  int<lower=0> N;
  real L; // censoring limit
  real<lower=L> y[N];
  int cens[N]; // -1 if left-censored, 0 if not censored
}
parameters {
  real mu;
  real<lower=0> sigma;
}
model {
  // priors
  mu ~ normal(0, 100);
  sigma ~ normal(0, 100);

  // likelihood
  for(i in 1:N) {
    if(cens[i] == -1) {
      // left-censored
      target += normal_cdf(L | mu, sigma);
    }
    else {
      // not censored
      target += normal_pdf(y[i] | mu, sigma);
    }
  }
}

To leave a comment for the author, please follow the link and comment on their blog: Silent Spring Institute Developer Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

A Bayesian approach to modelling censored data

Related

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)