Want to say one thing and the exact oppositive with strong confidence ?

[This article was first published on Freakonometrics - Tag - R-english, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

No need to do politics. Just take a statistical course. And I do not talk about misinterpretation of statistics, but I talk about the mathematical foundations of statistical tests.
Consider the following parametric test, with a one-dimensional parameter: http://freakonometrics.blog.free.fr/public/perso2/test-lies-01.gif versus http://freakonometrics.blog.free.fr/public/perso2/test-lies-02.gif, for some (fixed) http://freakonometrics.blog.free.fr/public/perso2/test-lies-03.gif. A standard way of doing such a test is to consider an rejection region http://freakonometrics.blog.free.fr/public/perso2/test-lies-05.gif. The test works as follows: consider a sample http://freakonometrics.blog.free.fr/public/perso2/test-lies-06.gif,
  • if http://freakonometrics.blog.free.fr/public/perso2/test-lies-07.gif, then we accept http://freakonometrics.blog.free.fr/public/perso2/test-H0.gif
  • if http://freakonometrics.blog.free.fr/public/perso2/test-lies-09.gif, the we reject http://freakonometrics.blog.free.fr/public/perso2/test-H0.gif
For instance, consider the case of a Bernoulli sample, with probability http://freakonometrics.blog.free.fr/public/perso2/test-lies-62.gif. The standard idea is to define
http://freakonometrics.blog.free.fr/public/perso2/test-lies-13.gif
The rejection region is then based on statistic http://freakonometrics.blog.free.fr/public/perso2/test-lies-210.gif,
  • if http://freakonometrics.blog.free.fr/public/perso2/test-lies-25.gif, then we accept http://freakonometrics.blog.free.fr/public/perso2/test-H0.gif
  • if http://freakonometrics.blog.free.fr/public/perso2/test-lies-22.gif, the we reject http://freakonometrics.blog.free.fr/public/perso2/test-H0.gif
where threshold http://freakonometrics.blog.free.fr/public/perso2/test-lies-26.gif is taken so that the probability to make a first type error is http://freakonometrics.blog.free.fr/public/perso2/test-lies-28.gif (say 5%) using the Gaussian approximation for z. Here
http://freakonometrics.blog.free.fr/public/perso2/test-lies-30.gif
Thus, the acceptation region is then the green area below, while the rejection region is the red one, for http://freakonometrics.blog.free.fr/public/perso2/test-lies-210.gif.

Consider now the exact opposite test (with the same http://freakonometrics.blog.free.fr/public/perso2/test-lies-03.gif), http://freakonometrics.blog.free.fr/public/perso2/test-lies-51.gif versus http://freakonometrics.blog.free.fr/public/perso2/test-lies-52.gif. Here, we use the same statistics, and the test is
  • if http://freakonometrics.blog.free.fr/public/perso2/test-lies-22.gif, then we accept http://freakonometrics.blog.free.fr/public/perso2/test-H0.gif
  • if http://freakonometrics.blog.free.fr/public/perso2/test-lies-25.gif, the we reject http://freakonometrics.blog.free.fr/public/perso2/test-H0.gif
where now
http://freakonometrics.blog.free.fr/public/perso2/test-lies-50.gif
Thus, now, the acceptation region is then the green area below, while the rejection region is the red one.

So if we summarize what we just said,
  • in the region on the left below, both test agree that http://freakonometrics.blog.free.fr/public/perso2/test-lies-55.gif
  • in the region on the right below, both test agree that http://freakonometrics.blog.free.fr/public/perso2/test-lies-57.gif
  • and in the region in blue, in the middle, the two tests disagree (one claims that http://freakonometrics.blog.free.fr/public/perso2/test-lies-55.gif, and the other one that http://freakonometrics.blog.free.fr/public/perso2/test-lies-57.gif)
Here is the evolution of the region as a function of http://freakonometrics.blog.free.fr/public/perso2/test-lies-56.gif (the size of the sample) when the sample frequency is 20%. With a small sample size, we can hardly say anything.
n=seq(1,100)
p=0.2
x1=p+qnorm(.95)*sqrt(p*(1-p)/n)
x2=p+qnorm(.05)*sqrt(p*(1-p)/n)
plot(n,x1,type="l",ylim=c(0,1))
polygon(c(n,rev(n)),c(x1,rev(x2)),col="light blue",border=NA)
lines(n,x1,lwd=2,col="red")
lines(n,x2,lwd=2,col="red")

One might say that those bounds are based on a Gaussian approximation which is not correct when http://freakonometrics.blog.free.fr/public/perso2/test-lies-56.gif is too small. So we can compute exact bounds,
y1=qbinom(.95,size=n,prob=p)/n
y2=qbinom(.05,size=n,prob=p)/n
polygon(c(n,rev(n)),c(y1,rev(y2)),col="blue",border=NA)
lines(n,y1,lwd=2,col="red")
lines(n,y2,lwd=2,col="red")
and we get

This is what we can observe if we use R statistical procedures, either the asymptotic one,
> prop.test(2,10,.5,alternative="less")
 
1-sample proportions test with continuity correction
 
data:  2 out of 10, null probability 0.5
X-squared = 2.5, df = 1, p-value = 0.05692
alternative hypothesis: true p is less than 0.5
95 percent confidence interval:
0.0000000 0.5100219
sample estimates:
p
0.2
 
> prop.test(2,10,.5,alternative="greater")
 
1-sample proportions test with continuity correction
 
data:  2 out of 10, null probability 0.5
X-squared = 2.5, df = 1, p-value = 0.943
alternative hypothesis: true p is greater than 0.5
95 percent confidence interval:
0.04368507 1.00000000
sample estimates:
p
0.2
or a more accurate one
> binom.test(2,10,.5,alternative="less")
 
Exact binomial test
 
data:  2 and 10
number of successes = 2, number of trials = 10, p-value = 0.05469
alternative hypothesis: true probability of success is less than 0.5
95 percent confidence interval:
0.0000000 0.5069013
sample estimates:
probability of success
0.2
 
> binom.test(2,10,.5,alternative="greater")
 
Exact binomial test
 
data:  2 and 10
number of successes = 2, number of trials = 10, p-value = 0.9893
alternative hypothesis: true probability of success is greater than 0.5
95 percent confidence interval:
0.03677144 1.00000000
sample estimates:
probability of success
0.2
Here, when the sample frequency is 20% and http://freakonometrics.blog.free.fr/public/perso2/test-lies-56.gif is equal to 10, we accept at the same time that theta is higher than 50% and lower than 50%.
And obviously it is not only a theoretical problem: it has obviously some strong implications. This morning, a good friend mentioned a post published some months ago, online here, about discrimination, and the lack of women with academic positions in mathematics, in France. As claimed by the author of the postA Paris VI, meilleure université française selon son président, sur 11 postes de maitres de conférences, 5 filles classées premières. Il y a donc des filles excellentes ? A Toulouse, sur 4 postes, 2 filles premières. Parité parfaite. Mais à côté de cela, Bordeaux, 4 postes, 0 fille première. Littoral, 3 postes, 0 fille, Nice, 5 postes, 0 fille, Rennes, 7 postes, 0 fille…“.
Consider the latter one: in Rennes, out of 7 people hired last year, no woman. So in some sense, it looks obvious that there is some kind of discrimination ! Zero out of seven ! Well, if we consider the fact that around 30% of PhD thesis in mathematics were defended by women those years, we can also try to see is there if no “positive discrimination“, i.e. test http://freakonometrics.blog.free.fr/public/perso2/test-lies-60.gif where theta is the probability to hire a woman (just to be a little bit provocative).
> prop.test(0,7,.3,alternative="less")
 
1-sample proportions test with continuity correction
 
data:  0 out of 7, null probability 0.3
X-squared = 1.7415, df = 1, p-value = 0.09347
alternative hypothesis: true p is less than 0.3
95 percent confidence interval:
0.0000000 0.3719021
sample estimates:
p
0
 
Warning message:
In prop.test(0, 7, 0.3, alternative = "less") :
Chi-squared approximation may be incorrect
> binom.test(0,7,.3,alternative="less")
 
Exact binomial test
 
data:  0 and 7
number of successes = 0, number of trials = 7, p-value = 0.08235
alternative hypothesis: true probability of success is less than 0.3
95 percent confidence interval:
0.0000000 0.3481637
sample estimates:
probability of success
0
With no woman hired that year, we can still pretend that there was some kind of “positive discrimination“. An note that we do accept – with more confidence – the assumption of “positive discrimination” if we look at all universities together,
> prop.test(5+2,11+4+4+3+5+7,.3,alternative="less")
 
1-sample proportions test with continuity correction
 
data:  5 + 2 out of 11 + 4 + 4 + 3 + 5 + 7, null probability 0.3
X-squared = 1.021, df = 1, p-value = 0.1561
alternative hypothesis: true p is less than 0.3
95 percent confidence interval:
0.0000000 0.3556254
sample estimates:
p
0.2058824
 
> binom.test(5+2,11+4+4+3+5+7,.3,alternative="less")
 
Exact binomial test
 
data:  5 + 2 and 11 + 4 + 4 + 3 + 5 + 7
number of successes = 7, number of trials = 34, p-value = 0.1558
alternative hypothesis: true probability of success is less than 0.3
95 percent confidence interval:
0.0000000 0.3521612
sample estimates:
probability of success
0.2058824
So obviously, with small sample, almost anything can be claimed !

To leave a comment for the author, please follow the link and comment on their blog: Freakonometrics - Tag - R-english.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)