A Significantly Improved Significance Test. Not!

[This article was first published on Publishable Stuff, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.


It is my great pleasure to share with you a breakthrough in statistical computing. There are many statistical tests: the t-test, the chi-squared test, the ANOVA, etc. I here present a new test, a test that answers the question researchers are most anxious to figure out, a test of significance, the significance test. While a test like the two sample t-test tests the null hypothesis that the means of two populations are equal the significance test does not tiptoe around the canoe. It jumps right in, paddle in hand, and directly tests whether a result is significant or not.

The significance test has been implemented in R as signif.test and is ready to be sourced and run. While other statistical procedures bombards you with useless information such as parameter estimates and confidence intervals signif.test only reports what truly matters, the one value, the p-vale.

I heart p values

For your convenience signif.test can be called exactly like t.test and will return the same p-value in order to facilitate p-value comparison with already published studies. Let me show you how signif.test works through a couple of examples using a dataset from the RANDOM.ORG database:

<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%"><span style="color: #408080; font-style: italic"># Sourcing the signif.test function</span>
source(<span style="color: #BA2121">"http://www.sumsar.net/files/posts/2014-02-12-a-significantly-improved-test/significance_test.R"</span>)

<span style="color: #408080; font-style: italic"># A one sample signif.test</span>
signif.test(c(<span style="color: #666666">7.6</span>, <span style="color: #666666">5.9</span>, <span style="color: #666666">5.2</span>, <span style="color: #666666">4.2</span>, <span style="color: #666666">-1</span>))
</pre></div>

<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%"><span style="color: #408080; font-style: italic">##   Test of Significance</span>
<span style="color: #408080; font-style: italic">## </span>
<span style="color: #408080; font-style: italic">## H0: p will be more than 0.05</span>
<span style="color: #408080; font-style: italic">## H1: p will be less than 0.05</span>
<span style="color: #408080; font-style: italic">## </span>
<span style="color: #408080; font-style: italic">##   Result</span>
<span style="color: #408080; font-style: italic">## significant (p = 0.0395)</span>
</pre></div>

<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%"><span style="color: #408080; font-style: italic"># A two sample signif.test</span>
signif.test(c(<span style="color: #666666">-0.7</span>, <span style="color: #666666">-4.4</span>, <span style="color: #666666">-7.8</span>, <span style="color: #666666">3.8</span>), c(<span style="color: #666666">17.9</span>, <span style="color: #666666">22.9</span>, <span style="color: #666666">16.3</span>, <span style="color: #666666">19.1</span>))
</pre></div>

<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%"><span style="color: #408080; font-style: italic">##   Test of Significance</span>
<span style="color: #408080; font-style: italic">## </span>
<span style="color: #408080; font-style: italic">## H0: p will be more than 0.05</span>
<span style="color: #408080; font-style: italic">## H1: p will be less than 0.05</span>
<span style="color: #408080; font-style: italic">## </span>
<span style="color: #408080; font-style: italic">##   Result</span>
<span style="color: #408080; font-style: italic">## extremely significant (p < 0.001)</span>
</pre></div>

Except for the p-value signif.test also reports a verbal description of the effect size of the p-value, for example:

<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%"><span style="color: #408080; font-style: italic"># An unsuccessful experiment</span>
signif.test(c(<span style="color: #666666">12.4</span>, <span style="color: #666666">7.9</span>, <span style="color: #666666">9.7</span>), c(<span style="color: #666666">13.9</span>, <span style="color: #666666">7.7</span>, <span style="color: #666666">9.9</span>), paired <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">TRUE</span>)
</pre></div>

<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%"><span style="color: #408080; font-style: italic">##   Test of Significance</span>
<span style="color: #408080; font-style: italic">## </span>
<span style="color: #408080; font-style: italic">## H0: p will be more than 0.05</span>
<span style="color: #408080; font-style: italic">## H1: p will be less than 0.05</span>
<span style="color: #408080; font-style: italic">## </span>
<span style="color: #408080; font-style: italic">##   Result</span>
<span style="color: #408080; font-style: italic">## not significant (n.s.)</span>
</pre></div>

<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%"><span style="color: #408080; font-style: italic"># A successful experiment</span>
signif.test(c(<span style="color: #666666">58.6</span>, <span style="color: #666666">62.7</span>, <span style="color: #666666">68.5</span>, <span style="color: #666666">58.8</span>, <span style="color: #666666">75.4</span>))
</pre></div>

<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%"><span style="color: #408080; font-style: italic">##   Test of Significance</span>
<span style="color: #408080; font-style: italic">## </span>
<span style="color: #408080; font-style: italic">## H0: p will be more than 0.05</span>
<span style="color: #408080; font-style: italic">## H1: p will be less than 0.05</span>
<span style="color: #408080; font-style: italic">## </span>
<span style="color: #408080; font-style: italic">##   Result</span>
<span style="color: #408080; font-style: italic">## significant beyond doubt (p < 0.0001)</span>
</pre></div>

An interesting situation, that oh so many researchers have been battling with, is when a result is almost significant. Here signif.test uses the database compiled by Matthew Hankins to every time give a new example of how such a result could be presented in writing.

<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%">signif.test(c(<span style="color: #666666">3.9</span>, <span style="color: #666666">8.9</span>, <span style="color: #666666">-1.2</span>, <span style="color: #666666">8.9</span>, <span style="color: #666666">2.1</span>))
</pre></div>

<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%"><span style="color: #408080; font-style: italic">##   Test of Significance</span>
<span style="color: #408080; font-style: italic">## </span>
<span style="color: #408080; font-style: italic">## H0: p will be more than 0.05</span>
<span style="color: #408080; font-style: italic">## H1: p will be less than 0.05</span>
<span style="color: #408080; font-style: italic">## </span>
<span style="color: #408080; font-style: italic">##   Result</span>
<span style="color: #408080; font-style: italic">## practically significant (p = 0.0831)</span>
</pre></div>

<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%">signif.test(c(<span style="color: #666666">3.9</span>, <span style="color: #666666">8.9</span>, <span style="color: #666666">-1.2</span>, <span style="color: #666666">8.9</span>, <span style="color: #666666">2.1</span>))
</pre></div>

<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%"><span style="color: #408080; font-style: italic">##   Test of Significance</span>
<span style="color: #408080; font-style: italic">## </span>
<span style="color: #408080; font-style: italic">## H0: p will be more than 0.05</span>
<span style="color: #408080; font-style: italic">## H1: p will be less than 0.05</span>
<span style="color: #408080; font-style: italic">## </span>
<span style="color: #408080; font-style: italic">##   Result</span>
<span style="color: #408080; font-style: italic">## on the very fringes of significance (p = 0.0831)</span>
</pre></div>

<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%">signif.test(c(<span style="color: #666666">3.9</span>, <span style="color: #666666">8.9</span>, <span style="color: #666666">-1.2</span>, <span style="color: #666666">8.9</span>, <span style="color: #666666">2.1</span>))
</pre></div>

<div class="highlight" style="background: #f8f8f8"><pre style="line-height: 125%"><span style="color: #408080; font-style: italic">##   Test of Significance</span>
<span style="color: #408080; font-style: italic">## </span>
<span style="color: #408080; font-style: italic">## H0: p will be more than 0.05</span>
<span style="color: #408080; font-style: italic">## H1: p will be less than 0.05</span>
<span style="color: #408080; font-style: italic">## </span>
<span style="color: #408080; font-style: italic">##   Result</span>
<span style="color: #408080; font-style: italic">## fell barely short of significance (p = 0.0831)</span>
</pre></div>

Download signif.test from here or source it directly as above to see many more useful formulations. A current limitation of signif.test is that it compares at most two groups. If your data contains more groups you can compare them two at a time, surely some combination is going give a significant result!

Or perhaps I’m just kidding…

p-values Suck

Sorry for being blunt, but it is true. p-values do not answer the question (dare I say most) people think/hope they do: “Is there a difference?” What is worse, the question most people think p-values answer (but they don’t) is not the right question to ask 95% of the time! In very few situations the interesting question is whether there is a difference, the interesting question is almost always: How large is the difference? What does p-values tell us about magnitudes? Zip!

p-values are visual noise taking up precious journal space that could be filled with useful stuff such as actual estimates, effect sizes, scatter plots, confidence or credible intervals, R-code, AICs and DICs, box plots, yes sometimes even white space would be an improvement. I’m not going to rant more about p-values here, they have already been accurately characterized by others:

  • Dance of the p-values, a fun and quick video about some problems with p-values.

If you just replace “disproved” with “debunked” this xkcd comic would be a pretty accurate description of how I feel about p-values:

xkcd 892

To leave a comment for the author, please follow the link and comment on their blog: Publishable Stuff.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)