**Gianluca Baio's blog**, and kindly contributed to R-bloggers)

Yesterday Gareth pointed me to this article on the BBC website. The underlying story has to do with Meredith Kercher’s murder and the subsequent trial involving mainly her flat-mate Amanda Knox, in Perugia (Italy).

As often in these gruesome cases, the evidence is pretty difficult to handle (eg because of contamination). Thus, even in the era of CSI, it is extremely difficult to prove guilt beyond reasonable doubt based on DNA evidence.

In Amanda Knox’s trial, the appeal court decided to dismiss DNA evidence altogether, on the grounds that the available sample (which was used to convict Knox and her alleged accomplice Sollecito, in the first appeal) was unreliable. The appeal judge then ruled that it was pointless to produce further samples.

But the BBC article reports a contribution by Coralie Colmez, who points out some inconsistencies in the use of the available evidence. Her main point is that “*the mathematics tells you that if you have an unreliable test, you do it again and you can be a lot more certain of that answer. And it’s not intuitive but a simple calculation will tell you that it’s true*“.

I suspect that her contribution may have been cut (too deeply) to fit the BBC article; but, it seems to me that in doing so the reported has not done a good job. The article says: “*You do a first test and obtain nine heads and one tail… The probability that the coin is fair given this outcome is about 8%, [and the probability] that it is biased, about 92%. Pretty convincing, but not enough to convict your coin of being biased beyond a reasonable doubt, Colmez says*“.

Now, what the BBC is not telling you is that there is a statistical model (and in fact a relatively complex one) behind this conclusion. I think this is what is going on in the background. Assume that $y=9$ heads are observed out of $n=10$ trials. We can reasonably model this as $y \mid \pi,n \sim \mbox{Binomial}(\pi,n)$, where $\pi$ is the probability that the coin gives a head. Also, assume that there are two possible data generating processes at play. The first one ($D=1$) is such that $\pi=0.5$ with probability $1$, which implies that the coin is unbiased. In the second one ($D=2$) we assume no particular knowledge about the true chance of observing a head, and thus model $\pi \sim \mbox{Uniform}(0,1)$.

Effectively, we are implying a mixture prior

$$ p(\pi) = w_1\times \pi_1 + w_2\times\pi_2 $$

where $w_1=\Pr(D=1)$, $w_2=(1-w_1)=\Pr(D=2)$, $\pi_1=0.5$, $\pi_2 \sim \mbox{Uniform}(0,1)$

Of course, there is nothing special about these two “hypotheses” and one can even argue that they are not necessarily the best possibilities! But, let’s assume that these are OK. You can use simple MCMC to obtain this (I think there is a very similar code in The BUGS Book, dealing with pretty much the same example). Here’s how I’ve coded it

model {

y ~ dbin(pi[D],n)

D ~ dcat(w[])

pi[1] <- 0.5

pi[2] ~ dunif(0,1)

p.biased <- D – 1

}

Assuming you save this to a file BBCcoin.txt, and that you have R, JAGS and R2jags installed in your computer, you can run this model directly from R, using the following code.

library(R2jags)

working.dir <- paste(getwd(),”/”,sep=””)

# Observed data

y <- 9 # number of heads

n <- 10 # number of trials

# Prior for the data generating process

w <- numeric()

w[1] <- .5 # probability that coin is unbiased

w[2] <- 1-w[1] # probability that coin is biased

filein <- “BBCcoin.txt”

dataJags <- list(y=y,n=n,w=w)

params <- c(“p.biased”,”pi”)

inits <- function(){

list(pi=c(NA,runif(1)),D=1+rbinom(1,1,.5))

}

n.iter <- 100000

n.burnin <- 9500

n.thin <- floor((n.iter-n.burnin)/500)

tic <- proc.time()

coin <- jags(dataJags, inits, params, model.file=filein,n.chains=2, n.iter, n.burnin, n.thin,

DIC=TRUE, working.directory=working.dir, progress.bar=”text”)

print(coin,digits=3,intervals=c(0.025, 0.975))

This gives the following results

Inference for Bugs model at “BBCcoin.txt”, fit using jags,

2 chains, each with 1e+05 iterations (first 9500 discarded),

n.thin = 181 n.sims = 1000 iterations saved

mu.vect sd.vect 2.5% 97.5% Rhat n.eff

p.biased 0.916 0.278 0.000 1.000 1.002 1000

pi[1] 0.500 0.000 0.500 0.500 1.000 1

pi[2] 0.802 0.160 0.308 0.974 1.002 1000

deviance 3.386 2.148 1.898 9.258 1.000 1000

For each parameter, n.eff is a crude measure of

effective sample size, and Rhat is the

potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)

pD = 2.3 and DIC = 5.7

DIC is an estimate of expected predictive error

(lower deviance is better).

**leave a comment**for the author, please follow the link and comment on his blog:

**Gianluca Baio's blog**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...