Statistical Tests: Asymptotic, Exact, ou based on Simulations?

Posted on October 20, 2015 by arthur charpentier in R bloggers | 0 Comments

[This article was first published on Freakonometrics » R-english, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This morning, in our mathematical statistics course, we’ve been discussing the ‘proportion test‘, i.e. given a sample of Bernoulli trials $\boldsymbol{x}=\{x_1,\cdots,x_n\}$ , with $X_i\sim\mathcal{B}(p)$ , we want to test

$H_0:p=p_0$ against $H_1:p\neq p_0$

A natural test (which can be related to the maximum likelihood ratio test) is based on the statistic

$T(\boldsymbol{x})=\sqrt n\frac{\widehat{p} - p_0}{\sqrt{p_0 (1-p_0)}}$

The test function is here

$\psi(\boldsymbol{x})=\boldsymbol{1}(T(\boldsymbol{x})\notin [c_{1,\alpha},c_{2,\alpha}])$ To get the bounds of the acceptance region, we need the distribution of $T(\boldsymbol{X})$ , under $H_0$ . Consider here a numerical application

n=20
p=.5
set.seed(1)
echantillon=sample(0:1,size=n,
            prob=c(1-p,p),
            replace=TRUE)

the asymptotic distribution

The first (and standard idea) is to use the central limit theorem, since

$\sqrt n\frac{\hat{p} - p}{\sqrt{p (1-p)}}\overset{\mathcal{L}}{\rightarrow}\mathcal{N}(0,1)$

So, under $H_0$ ,

$\sqrt n\frac{\hat{p} - p_0}{\sqrt{p_0 (1-p_0)}}\overset{\mathcal{L}}{\rightarrow}\mathcal{N}(0,1)$

Then $c_{1,\alpha}=\Phi^{-1}(\alpha/2)$ while $c_{2,\alpha}=\Phi^{-1}(1-\alpha/2)$ . The acceptance region is then between the two red lines, below,

T=sqrt(n)*(mean(echantillon)-.5)/
  sqrt(mean(echantillon)*
  (1-mean(echantillon)))
u=seq(-3,3,by=.01)
v=dnorm(u)
plot(u,v,type="l",lwd=2)
abline(v=qnorm(.025),col="red")
abline(v=qnorm(.975),col="red")
abline(v=T,col="blue")

the exact distribution

Here we use the fact that

$\sum_{i=1}^n X_i \sim \mathcal{B}(n,p)$

Using transformation of the ‘density’, we can (at least numerically) compute the (exact) distribution of

$T(\boldsymbol{x})=\sqrt n\frac{\widehat{p} - p_0}{\sqrt{p_0 (1-p_0)}}$

u=seq(-3,3,by=.01)
v=sqrt(.5*(1-.5))*n*dbinom(round(
  (sqrt(.5*(1-.5))*u/sqrt(n)+.5)*n),
  size=n,prob=.5)/sqrt(n)

Here I used a round value, it guess it would be better with a floor function, but here the graph looks symmetric (which is something I like)

abline(v=sqrt(n)*(qbinom(.025,size=n,prob=.5)/n-.5)/sqrt(.5*(1-.5)),col="red")
abline(v=sqrt(n)*(qbinom(.975,size=n,prob=.5)/n-.5)/sqrt(.5*(1-.5)),col="red")
lines(u,v,type="s")

distribution based on Monte Carlo simulations

Probably more interesting, here we do not use the fact that we might know the distribution of the mean. We just generate random samples, under $H_0$ , and then compute $T(\boldsymbol{X})$ ,

T=rep(NA,1000)
for(i in 1:1000){
x=sample(0:1,size=n,
         prob=c(1-.5,.5),
         replace=TRUE)
m=mean(x)
T[i]=(m-.5)/sqrt(m*(1-m))*sqrt(n)}
lines(density(T),lwd=2)
abline(v=quantile(T,.025),col="red")
abline(v=quantile(T,.975),col="red")

To leave a comment for the author, please follow the link and comment on their blog: Freakonometrics » R-english.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Statistical Tests: Asymptotic, Exact, ou based on Simulations?

Related

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)