# Absence of evidence is not evidence of absence: Testing for equivalence

**The 20% Statistician**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

*See the follow up post where I introduce my R package and spreadsheet TOSTER to perform TOST equivalence tests, and link to a practical primer on this topic.*

When you find *p* > 0.05, you did not observe surprising data, assuming there is no true effect. You can often read in the literature how *p* > 0.05 is interpreted as ‘no effect’ but due to a lack of power the data might not be surprising if there *was* an effect. In this blog I’ll explain how to test for equivalence, or the lack of a meaningful effect, using equivalence hypothesis testing. I’ve created easy to use R functions that allow you to perform equivalence hypothesis tests. **Warning**: If you read beyond this paragraph, you will never again be able to write “as predicted, the interaction revealed there was an effect for participants in the experimental condition (*p*< 0.05) but there was no effect in the control condition (*F* < 1).” If you prefer the veil of ignorance, here’s a nice site with cute baby animals to spend the next 9 minutes on instead.

*p*-values (why no one ever told me this in the 11 years I worked in science is beyond me). It’s as easy as doing a

*t*-test, or more precisely, as doing two

*t*-tests.

*t*-tests’ (TOST) procedure (Schuirmann, 1987). Its simplicity makes it wonderfully easy to use.

*p*< 0.05). The alternative hypothesis is that the effect is smaller than a SESOI, anywhere in the

*equivalence range*– any effect you think is too small to matter, or too small to feasibly examine. For example, a Cohen’s d of 0.5 is a medium effect, so you might set d = 0.5 as your SESOI, and the equivalence range goes from d = -0.5 to d = 0.5 In the TOST procedure, you first decide upon your SESOI: anything smaller than your smallest effect size of interest is considered smaller than small, and will allow you to reject the null-hypothesis that there is a true effect. You perform two

*t*-tests, one testing if the effect is smaller than the upper bound of the equivalence range, and one testing whether the effect is larger than the lower bound of the equivalence range. Yes, it’s that simple.

*t*-test with two conditions of 90 participants each) is not statistically different from 0 at an alpha of 5%, and the

*p*-value of the normal NHST is 0.384 (the title provides the exact numbers for the 95% CI around the effect size). But is this effect statistically smaller than my smallest effect size of interest?

**Rejecting the presence of a meaningful effect**

*t*-tests, testing the observed effect size against the ‘null’ of your SESOI (0.5 and -0.5). These

*t*-tests show the d = 0.13 is significantly larger than d = -0.5, and significantly smaller than d = 0.5. The highest of these two

*p*-values is

*p*= 0.007. We can conclude that there is support for the lack of a meaningful effect (where meaningful is defined by your choice of the SESOI). The second approach (which is easier to visualize) is to calculate a 90% CI around the effect (indicated by the vertical line in the figure), and check whether this 90% CI falls completely within the equivalence range. You can see a line from the upper and lower limit of the 90% CI around d = 0.13 on the left to the equivalence range on the right, and the 90% CI is completely contained within the equivalence range. This means we can reject the null of an effect that is larger than d = 0.5 or smaller than d = -0.5 and conclude this effect is smaller than what we find meaningful (and you’ll be right 95% of the time, in the long run).

*p*< 0.05) but there was no effect in the control condition (

*F*< 1).” Well, I just ruined that for you. Absence of evidence is not evidence of absence!

*p*> 0.05 as evidence of a lack of an effect. You can switch to Bayesian statistics if you want to support the null, but the default priors are only useful of in research areas where very large effects are examined (e.g., some areas of cognitive psychology), and are not appropriate for most other areas in psychology, so you will have to be able to quantify your prior belief yourself. You can teach yourself how, but there might be researchers who prefer to provide support for the lack of an effect within a Frequentist framework. Given that most people think about the effect size they expect when designing their study, defining the SESOI at this moment is straightforward. After choosing the SESOI, you can even design your study to have sufficient power to reject the presence of a meaningful effect. Controlling your error rates is thus straightforward in equivalence hypothesis tests, while it is not that easy in Bayesian statistics (although it can be done through simulations).

*t*-tests, but not correlations) requires that you use the raw data. In my experience (Lakens, 2013)researchers find it easier to use their own preferred software to handle their data, and then calculate additional statistics not provided by the software they use by typing in summary statistics in a spreadsheet (means, standard deviations, and sample sizes per condition). So my functions don’t require access to the raw data (which is good for reviewers as well). Finally, the functions make a nice picture such as the one above so you can see what you are doing.

**R Functions**

*t*-tests, paired samples

*t*-tests, and correlations, where you can set the equivalence thresholds using Cohen’s d, Cohen’s dz, and r. I adapted the equation for power analysis to be based on d, and I created the equation for power analyses for a paired-sample t-test from scratch because I couldn’t find it in the literature.

**If it is not obvious: None of this is peer-reviewed (yet), and you should use it at your own risk.**I checked the independent andpaired t-test formulas against theresults from Minitab software and reproduced examples in the literature, and checked the power analyses against simulations, and all yielded the expected results, so that’s comforting. On the other hand, I had never heard of equivalence testing until 9 days ago (thanks ‘Bum Deggy’), so that’s less comforting I guess. Send me an email if you want to use these formulas for anything serious like a publication. If you find a mistake or misbehaving functions, let me know.

**Choosing your SESOI**

*In applied research, practical limitations of the SESOI can often be determined on the basis of a cost–benefit analysis. For example, if an intervention costs more money than it saves, the effect size is too small to be of practical significance. In theoretical research, the SESOI might be determined by a theoretical model that is detailed enough to make falsifiable predictions about the hypothesized size of effects. Such theoretical models are rare, and therefore, researchers often state that they are interested in any effect size that is reliably different from zero. Even so, because you can only reliably examine an effect that your study is adequately powered to observe, researchers are always limited by the practical limitation of the number of participants that are willing to participate in their experiment or the number of observations they have the resources to collect.*

*t*-test with an alpha of 0.05. You have 80% power to detect an effect with a Cohen’s d of 0.57. To have 80% power to reject an effect of d = 0.57 or larger in TOST you would need 66 participants in each condition.

**Conclusion**

*Frontiers in Psychology*,

*4*. http://doi.org/10.3389/fpsyg.2013.00863

*European Journal of Social Psychology*,

*44*(7), 701–710. http://doi.org/10.1002/ejsp.2023

*Psychological Bulletin*,

*113*(3), 553.

*Journal of Pharmacokinetics and Biopharmaceutics*,

*15*(6), 657–680.

**The R-code that was part of this blog post is temporarily unavailable as I move it to a formal R package.**

**leave a comment**for the author, please follow the link and comment on their blog:

**The 20% Statistician**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.