# The perfect t-test

**Daniel Lakens**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

*I’ve created an easy to use R script that will import your data, and performs and writes up a state-of-the-art dependent or independent t-test. The goal of this script is to examine whether more researcher-centered statistical tools (i.e., a one-click analysis script that checks normality assumptions, calculates effect sizes and their confidence intervals, creates good figures, calculates Bayesian and robust statistics, and writes the results section) increases the use of novel statistical procedures. Download the script here:*

*https://github.com/Lakens/Perfect-t-test. For comments, suggestions, or errors, e-mail me at [email protected] The script will likely be updated – check back for updates or follow me @Lakens to be notified of updates.*

Correctly comparing two groups is remarkably challenging. When performing a

*t*-test researchers rarely manage to follow all recommendations that statisticians have made over the years. Where statisticians update their recommendations, statistical textbooks often do not. Even though reporting effect sizes and their confidence intervals has been recommended for decades (e.g., Cohen, 1990), statistical software (e.g., SPSS 22) often does not provide these statistics. Progress is slow, and Sharpe (2013) points to a lack of awareness, a lack of time, a lack of easily usable software, and a lack of education as some of the main reasons for the resistance to adopting statistical innovations.

# Comparing two groups

Keselman, Othman, Wilcox, and Fradette (2004) proposed the a more robust two-sample

*t*-test that provides better Type 1 error control in situations of variance heterogeneity and nonnormality, but their recommendations have not been widely implemented. Researchers might in general be unsure whether it is necessary to change the statistical tests they use to analyze and report comparisons between groups. As Wilcox, Granger, and Clark (2013, p. 29) remark: “All indications are that generally, the safest way of knowing whether a more modern method makes a practical difference is to actually try it.” Making sure conclusions based on multiple statistical approaches converge is an excellent way to gain confidence in your statistical inferences. This R script calculates traditional Frequentist statistics, Bayesian statistics, and robust statistics, using both a hypothesis testing as an estimation approach, to invite researchers to examine their data from different perspectives.

Since Frequentist and Bayesian statistics are based on assumptions of equal variances and normally distributed data, the R script provides boxplots and histograms with kernel density plots overlaid with a normal distribution curve to check for outliers and normality. Kernel density plots are a non-parametric technique to visualize the distribution of a continuous variable. They are similar to a histogram, but less dependent on the specific choice of bins used when creating a histogram. The graphs plot both the normal distribution, as the kernel density function, making it easier to visually check whether the data is normally distributed or not. Q-Q plots are provided as an additional check for normality.

Yap and Sim (2011) show that no single test for normality will perform optimally for all possible distributions. They conclude (p. 2153): “If the distribution is symmetric with low kurtosis values (i.e. symmetric short-tailed distribution), then the D’Agostino-Pearson and Shapiro-Wilkes tests have good power. For symmetric distribution with high sample kurtosis (symmetric long-tailed), the researcher can use the JB, Shapiro-Wilkes, or Anderson-Darling test.” All four normality tests are provided in the R script. Levene’s test for the equality of variances is provided, although for independent t-tests, Welch’s *t*-test (which does not require equal variances) is provided by default, following recommendations by Ruxton (2006). A short explanation accompanies all plots and assumption checks to help researchers to interpret the results.

# Running the Markdown script

R Markdown scripts provide a way to create fully reproducible reports from data files. The script combines the commands to perform all statistical analyses with the written sections of the final output. Calculated statistics and graphs are inserted into the written report at specified locations. After installing the required packages, preparing the data, and specifying some variables in the Markdown document, the report can be generated (and thus, the analysis procedure can be performed) with a single mouse-click (scroll down for an example of the output).

*PoweR*package (Micheaux & Tran, 2014) to perform the normality tests,

*HLMdiag*to create the Q-Q plots (Loy & Hofmann, 2014).

*ggplot2*for all plots (Wickham, 2009),

*car*(Fox & Weisberg, 2011) to perform Levene’s test,

*MBESS*(Kelley, 2007) to calculate effect sizes and their confidence intervals,

*WRS*for the robust statistics (Wilcox & Schönbrodt, 2015), BootsES to calculate a robust effect size for the independent

*t*-test (Kirby & Gerlanc, 2013),

*BayesFactor*for the bayes factor (Morey & Rouder, 2015), and

*BEST*(Kruschke & Meredith, 2014) to calculate the Bayesian highest density interval.

*t*-test the data file needs to contain at least two columns (one specifying the independent variable, and one specifying the dependent variable, and for the dependent

*t*-test the data file needs to contain three columns, one subject identifier column, and two columns for the two dependent variables. The script for dependent

*t*-tests allows you to select a subgroup for the analysis, as long as the data file contains an additional grouping variable (see the demo data). The data files can contain irrelevant data, which will be ignored by the script. Finally, researchers need to specify the names (or headers) of the independent and dependent variables, as well as grouping variables. Finally, there are some default settings researchers can change, such as the sidedness of the test, the alpha level, the percentage for the confidence intervals, and the scalar on the prior for the Bayes Factor.

The statistical results the script generates has been compared against the results provided by SPSS, JASP, ESCI, online Bayes Factor calculators, and BEST online. Minor variations in the HDI calculation between BEST online and this script are possible depending on the burn-in samples and number of samples, and for huge

*t*-values there are minor variations between JASP and the latest version of the Bayes Factor package used in this script. This program is distributed in the hope that it will be useful, but without any warranty. If you find an error, please contact me at [email protected]

# Promoting Statistical Innovations

Statistical software is built around

*individual*statistical tests, while researchers perform a

*set of procedures*. Although it is not possible to create standardized procedures for all statistical analyses, most, if not all, of the steps researchers have to go through when they want to report correlations, regression analyses, ANOVA’s, and meta-analyses are sufficiently structured. These tests make up a large portion of analyses reported in journal articles. Demonstrating this, David Kennyhas created R scripts that will perform and report mediation and moderator analyses. Felix Schönbrodt has created a Shiny app that performs several meta-analytic techniques. Making statistical innovations more accessible has a high potential to substantially improve the quality of the statistical tests researchers perform and report. Statisticians who take the application of generated knowledge seriously should try to experiment with the best way to get researchers to use state-of-the-art techniques. R markdown scripts are an excellent method to combine statistical analyses and a written report in free software. Shiny apps might make these analyses even more accessible, because they no longer require users to install R and R packages.

*forked*(copied to a new repository) where researchers are free to remove, add, or change sections of the script to create their own ideal test. After some time, a number of such scripts may be created, allowing researchers to choose an analysis procedure that most closely matches their desires. Alternatively, researchers can post feature requests or errors that can be incorporated in future versions of this script.

# References

*Behavior research methods*,

*44*, 158-175.

*Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis*. New York: Routledge.

*An R Companion to Applied Regression, Second edition*. Sage, Thousand Oaks CA.

*Theory of probability (3rd ed.)*. Oxford: Oxford University Press, Clarendon Press.

*Journal of Statistical Software*,

*20*, 1-24.

*Behavior Research Methods*,

*45*, 905-927.

*BEST: Bayesian Estimation Supersedes the t-test*. R package version 0.2.2, URL: http://CRAN.R-project.org/package=BEST.

*Psychological Bulletin*,

*111*, 361-365.

*BayesFactor: Computation of Bayes Factors for Common Designs*. R package version 0.9.11-1, URL: http://CRAN.R-project.org/package=BayesFactor

*Psychological Methods*,

*18*, 572-582.

*ggplot2: elegant graphics for data analysis*. Springer New York. ISBN 978-0-387-98140-6, URL: http://had.co.nz/ggplot2/book.

*Universal Journal of Psychology*,

*1*, 21-31.

*The WRS package for robust statistics in R (version 0.27.5)*. URL: https://github.com/nicebread/WRS.

*Journal of Statistical Computation and Simulation*,

*81*, 2141-2155.

**leave a comment**for the author, please follow the link and comment on their blog:

**Daniel Lakens**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.