# ABC in Roma [R lab #2]

March 2, 2012
By

(This article was first published on Xi'an's Og » R, and kindly contributed to R-bloggers)

Here are the R codes of the second R lab organised by Serena Arima in supplement of my lectures (now completed!). This morning I covered ABC model choice and the following example is the benchmark used in the course (and in the paper) about the impact of summary statistics. (Warning! It takes a while to run…)

n.iter=10000
n=c(10,100,1000)
n.sims=100
prob.m1=matrix(0,nrow=n.sims,ncol=length(n))
prob.m2=prob.m1

### Simulation from Normal model
for(sims in 1:n.sims){

## True data generation from the Normal distribution and summary statistics
for(i in 1:length(n)){
y.true=rnorm(n[i],0,1)
stat.true=c(mean(y.true),median(y.true),var(y.true))

## ABC algorithm
# Sample from the prior
mu=rnorm(n.iter,0,2)
dist.m1=rep(NA,n.iter)
dist.m2=rep(NA,n.iter)

for(j in 1:n.iter){
# Data generation under both models
# Normal model
y.m1=rnorm(n[i],mu[j])
stat.m1=c(mean(y.m1),median(y.m1),var(y.m1))
# computing the distance
dist.m1[j]=sum((stat.m1-stat.true)^2)
# Laplace model
y.m2=mu[j]+sample(c(-1,1),n[i],rep=TRUE)*rexp(n[i],rate=sqrt(2))
stat.m2=c(mean(y.m2),median(y.m2),var(y.m2))
# computing the distance
dist.m2[j]=sum((stat.m2-stat.true)^2)
}

# select epsilon as 1% distance quantile
epsilon=quantile(c(dist.m1,dist.m2),prob=0.01)

# compute the posterior probability that data come from
# the model
prob.m1[sims,i]=sum(dist.m1<epsilon)/(2*n.iter*0.01)
}}

### Simulation from the Laplace model
for(sims in 1:n.sims){

## True data generation from the Laplace distribution and summary statistics
for(i in 1:length(n)){
y.true=sample(c(-1,1),n[i],rep=TRUE)*rexp(n[i],rate=sqrt(2))
stat.true=c(mean(y.true),median(y.true),var(y.true))

## ABC algorithm
# Sample from the prior
mu=rnorm(n.iter,0,2)
dist.m1=rep(NA,n.iter)
dist.m2=rep(NA,n.iter)

for(j in 1:n.iter){
# Data generation under both models
# Normal model
y.m1=rnorm(n[i],mu[j])
stat.m1=c(mean(y.m1),median(y.m1),var(y.m1))
# computing the distance
dist.m1[j]=sum((stat.m1-stat.true)^2)
# Laplace model
y.m2=mu[j]+sample(c(-1,1),n[i],rep=TRUE)*rexp(n[i],rate=sqrt(2))
stat.m2=c(mean(y.m2),median(y.m2),var(y.m2))
# computing the distance
dist.m2[j]=sum((stat.m2-stat.true)^2)
}

# select epsilon as 1% distance quantile
epsilon=quantile(c(dist.m1,dist.m2),prob=0.01)

# compute the posterior probability that data come from
# the model
prob.m2[sims,i]=sum(dist.m2<epsilon)/(2*n.iter*0.01)
}
}

# Visualize the results
y.true=sample(c(-1,1),n[i],rep=TRUE)*rexp(n[i],rate=sqrt(2))
stat.true=c(mean(y.true),median(y.true),var(y.true))

## ABC algorithm
# Sample from the prior
mu=rnorm(n.iter,0,2)
dist.m1=rep(NA,n.iter)
dist.m2=rep(NA,n.iter)

for(j in 1:n.iter){
# Data generation under both models
# Normal model
y.m1=rnorm(n[i],mu[j])
stat.m1=c(mean(y.m1),median(y.m1),var(y.m1))
# computing the distance
dist.m1[j]=sum((stat.m1-stat.true)^2)
# Laplace model
y.m2=mu[j]+sample(c(-1,1),n[i],rep=TRUE)*rexp(n[i],rate=sqrt(2))
stat.m2=c(mean(y.m2),median(y.m2),var(y.m2))
# computing the distance
dist.m2[j]=sum((stat.m2-stat.true)^2)
}

# select epsilon as 1% distance quantile
epsilon=quantile(c(dist.m1,dist.m2),prob=0.01)

# compute the posterior probability that data come from
# the model
prob.m2[sims,i]=sum(dist.m2<epsilon)/(2*n.iter*0.01)
}
}

# Visualize the results
y.true=sample(c(-1,1),n[i],rep=TRUE)*rexp(n[i],rate=sqrt(2))
stat.true=c(mean(y.true),median(y.true),var(y.true))

## ABC algorithm
# Sample from the prior
mu=rnorm(n.iter,0,2)
dist.m1=rep(NA,n.iter)
dist.m2=rep(NA,n.iter)

for(j in 1:n.iter){
# Data generation under both models
# Normal model
y.m1=rnorm(n[i],mu[j])
stat.m1=c(mean(y.m1),median(y.m1),var(y.m1))
# computing the distance
dist.m1[j]=sum((stat.m1-stat.true)^2)
# Laplace model
y.m2=mu[j]+sample(c(-1,1),n[i],rep=TRUE)*rexp(n[i],rate=sqrt(2))
stat.m2=c(mean(y.m2),median(y.m2),var(y.m2))
# computing the distance
dist.m2[j]=sum((stat.m2-stat.true)^2)
}

# select epsilon as 1% distance quantile
epsilon=quantile(c(dist.m1,dist.m2),prob=0.01)

# compute the posterior probability that data come from
# the model
prob.m2[sims,i]=sum(dist.m2<epsilon)/(2*n.iter*0.01)
}}

# Visualize the results
boxplot(data.frame(prob.m1[,1],prob.m2[,1]),
names=c("M1","M2"),main="N=10")
boxplot(data.frame(prob.m1[,2],prob.m2[,2]),
names=c("M1","M2"),main="N=100")
boxplot(data.frame(prob.m1[,3],prob.m2[,3]),
names=c("M1","M2"),main="N=1000")

Once again, I had a terrific time teaching this “ABC in Roma” course, and not only for the immense side benefit of enjoy Roma in a particularly pleasant weather (for late February).

Filed under: R, Statistics, University life Tagged: ABC, La Sapienza, PhD course, R, Roma

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

Tags: , , , , , ,