A Bayesian Model to Calculate Whether My Wife is Pregnant or Not

Posted on November 5, 2015 by Rasmus Bååth in R bloggers | 0 Comments

[This article was first published on Publishable Stuff, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

On the 21st of February, 2015, my wife had not had her period for 33 days, and as we were trying to conceive, this was good news! An average period is around a month, and if you are a couple trying to go triple, then a missing period is a good sign something is going on. But at 33 days, this was not yet a missing period, just a late one, so how good news was it? Pretty good, really good, or just meh?

To get at this I developed a simple Bayesian model that, given the number of days since your last period and your history of period onsets, calculates the probability that you are going to be pregnant this period cycle. In this post I will describe what data I used, the priors I used, the model assumptions, and how to fit it in R using importance sampling. And finally I show you why the result of the model really didn’t matter in the end. Also I’ll give you a handy script if you want to calculate this for yourself. 🙂

The data

During the last part of 2014 my wife kept a journal of her period onsets, which was good luck for me, else I would end up with tiny data again. In total we had the dates for eight period onsets, but the data I used was not the onsets but the number of days between the onsets:

period_onset <- as.Date(c("2014-07-02", "2014-08-02", "2014-08-29", "2014-09-25",
                          "2014-10-24", "2014-11-20", "2014-12-22", "2015-01-19"))
days_between_periods <- as.numeric(diff(period_onset))

So the onsets occur pretty regularly, hovering around a cycle of 28 days. The last onset was on the 19th of January, so on the 21st of February there had been 33 days since the last onset.

Conceiving a model

I was constructing a model covering period cycles, pregnancies and infertility, and as such it was obviously going to make huge simplifications. Some general assumptions I made were:

The woman and the man have no prior reasons for being infertile as a couple.
The woman has regular periods.
The couple trying to conceive are actively trying to conceive. Say, two three times a week as recommended by Wilcox et al. (2000).
If there is pregnancy, there are no more periods.

Now to the specific assumptions I made:

The number of days between periods (days_between_periods) is assumed to be normally distributed with unknown mean (mean_period) and standard deviation (sd_period).
The probability of getting pregnant during a cycle is assumed to be 0.19 (more about where this number comes from below) if you are fertile as a couple (is_fertile). Unfortunately not all couples are fertile, and if you are not then the probability of getting pregnant is 0. If fertility is coded as 0-1 then this can be compactly written as 0.19 * is_fertile.
The probability of failing to conceive for a certain number of periods (n_non_pregnant_periods) is then (1 - 0.19 * is_fertile)^n_non_pregnant_periods
Finally, if you are not going to be pregnant this cycle, the number of days from your last to your next period (next_period) is going to be more than the current number of days since the last period (days_since_last_period). That is, the probability of next_period < days_since_last_period is zero. This sounds strange because it is so obvious, but we’re going to need it in the model.

That was basically it! But in order to fit this I was going to need a likelihood function, a function that, given fixed parameters and some data, calculates the probability of the data given those parameters or, more commonly, something proportional to a probability, that is, a likelihood. And as this likelihood can be extremely tiny I needed to calculate it on the log scale to avoid numerical problems. When crafting a log likelihood function in R, the general pattern is this:

The function will take the data and the parameters as arguments.
You initialize the likelihood to 1.0, corresponding to 0.0 on the log scale (log_like <- 0.0).
Using the probability density functions in R (such as dnorm, dbinom and dpois) you calculate the likelihoods of the different the parts of the model. You then multiply these likelihoods together. On the log scale this corresponds to adding the log likelihoods to log_like.
To make the d* functions return log likelihoods just add the argument log = TRUE. Also remember that a likelihood of 0.0 corresponds to a log likelihood of -Inf.

So, a log likelihood function corresponding to the model above would then be:

calc_log_like <- function(days_since_last_period, days_between_periods,
                          mean_period, sd_period, next_period,
                          is_fertile, is_pregnant) {
  n_non_pregnant_periods <- length(days_between_periods)
  log_like <- 0
  if(n_non_pregnant_periods > 0) {
    log_like <- log_like + sum( dnorm(days_between_periods, mean_period, sd_period, log = TRUE) )
  }
  log_like <- log_like + log( (1 - 0.19 * is_fertile)^n_non_pregnant_periods )
  if(!is_pregnant && next_period < days_since_last_period) {
    log_like <- -Inf
  }
  log_like
}