# A Function for Calculating Tidy Summaries of Multiple t-tests

April 5, 2020
By

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The t-test is one of the most often used in psychology and other social sciences. In APA format, researchers are told to report the means and standard deviations of both conditions; the t-statistic, its degrees of freedom, and its p-value; and an effect size with confidence interval (generally Cohen’s d and 95%).

Researchers frequently conduct randomized experiments with not just one dependent variable, but many. And they may want to make sure that other variables, such as age, do not differ by condition.

The following function will return all the necessary information from t-tests. I don’t have it in a package yet (and don’t know where I’d put it—yet. Drop me a line if you think it fits in an existing package; I would be happy to include it in an existing package). So you’ll have to copy, paste, and run the following into your script to use it.

``````t_table <- function(data, dvs, iv, var_equal = TRUE, p_adj = "none") {

if (!inherits(data, "data.frame")) {
stop("data must be a data.frame")
}

if (!all(c(dvs, iv) %in% names(data))) {
stop("at least one column given in dvs and iv are not in the data")
}

if (!all(sapply(data[, dvs], is.numeric))) {
stop("all dvs must be numeric")
}

if (length(unique(na.omit(data[[iv]]))) != 2) {
stop("independent variable must only have two unique values")
}

out <- lapply(dvs, function(x) {

tres <- t.test(data[[x]] ~ data[[iv]], var.equal = var_equal)

mns <- tapply(data[[x]], data[[iv]], mean, na.rm = TRUE)
names(mns) <- paste0(names(mns), "_m")

sds <- tapply(data[[x]], data[[iv]], sd, na.rm = TRUE)
names(sds) <- paste0(names(sds), "_sd")

es <- MBESS::ci.smd(ncp = tres\$statistic,
n.1 = table(data[[iv]])[[1]],
n.2 = table(data[[iv]])[[2]])

c(
c(mns[1], sds[1], mns[2], sds[2]),
tres\$statistic,
tres\$parameter,
p = tres\$p.value,
d = unname(es\$smd),
d_lb = es\$Lower,
d_ub = es\$Upper
)
})

out <- as.data.frame(do.call(rbind, out))
out <- cbind(variable = dvs, out)
names(out) <- gsub("[^0-9A-Za-z_]", "", names(out))

return(out)
}``````

The first argument specifies a `data.frame` where the data reside, a string vector of the names of the dependent variables, a string indicating the independent variable, a logical value on whether or not to assume variances are equal across conditions (defaults to `TRUE` for a classic t-test), and a string indicating what p-value adjustments to do. See `?p.adjust.methods` for more information on which methods are available to use. This defaults to no adjustment. (The function with full documentation in `{roxygen2}` format can be found at my GitHub.) Note that this function depends on the `{MBESS}` package, so make sure to have that installed first (but you don’t need to call `library` on it).

What does it look like in action? Let’s imagine a `ctl` and `exp` condition, with the dependent variables of `y1`, `y2`, etc., through `y10`, and a sample size of 128. I simulate that data below, where `y1` and `y2` have significant effects with a Cohen’s d = 0.5 and 0.8, respectively.

``````set.seed(1839)
cond <- rep(c("ctl", "exp"), each = 64)
y1 <- rnorm(128, ifelse(cond == "ctl", 0, .5))
y2 <- rnorm(128, ifelse(cond == "ctl", 0, .8))
dat <- as.data.frame(lapply(1:8, function(zzz) rnorm(128)))
dat <- cbind(cond, y1, y2, dat)
names(dat)[4:11] <- paste0("y", 3:10)
dat[1:5, 1:5]``````
``````##   cond         y1         y2          y3         y4
## 1  ctl  1.0127014 -1.6556888  2.61696223  0.3817117
## 2  ctl -0.6845605  0.8893057  0.05602809 -1.6996460
## 3  ctl  0.3492607 -0.4439924  0.33997464  0.8473431
## 4  ctl -1.6245010  1.2612491 -0.99240679 -0.2083059
## 5  ctl -0.5162476  0.2012927 -0.96291759 -0.2948407``````

We can then feed the necessary information to the `t_table` function. Note that, instead of typing out all the `y1` through `y10` columns, I use the `paste0` function to generate them in less code. I don’t do any rounding for you inside of the function, but for the sake of presentation, I round to 2 decimal points here.

``````result <- t_table(dat, paste0("y", 1:10), "cond")
result[-1] <- lapply(result[-1], round, 2)
result``````
``````##    variable ctl_m ctl_sd exp_m exp_sd     t  df    p     d  d_lb  d_ub
## 1        y1 -0.07   0.96  0.26   0.90 -2.04 126 0.04 -0.36 -0.71 -0.01
## 2        y2  0.05   1.02  0.82   0.91 -4.48 126 0.00 -0.79 -1.15 -0.43
## 3        y3  0.07   1.11 -0.07   0.99  0.76 126 0.45  0.13 -0.21  0.48
## 4        y4  0.00   1.04 -0.27   1.05  1.44 126 0.15  0.26 -0.09  0.60
## 5        y5  0.06   0.91 -0.28   1.08  1.93 126 0.06  0.34 -0.01  0.69
## 6        y6  0.05   1.03  0.06   1.09 -0.08 126 0.94 -0.01 -0.36  0.33
## 7        y7 -0.01   0.96 -0.06   1.08  0.30 126 0.77  0.05 -0.29  0.40
## 8        y8 -0.08   0.99 -0.18   0.98  0.56 126 0.58  0.10 -0.25  0.45
## 9        y9  0.37   0.93 -0.18   1.05  3.13 126 0.00  0.55  0.20  0.91
## 10      y10 -0.11   0.85 -0.21   0.94  0.59 126 0.56  0.10 -0.24  0.45``````

Note that we got a few false positives here. Which leads to using p-value adjustments, if you wish. Let’s now say I use the Holm adjustment.

``````result2 <- t_table(dat, paste0("y", 1:10), "cond", p_adj = "holm")
result2[-1] <- lapply(result2[-1], round, 2)
result2``````
``````##    variable ctl_m ctl_sd exp_m exp_sd     t  df    p     d  d_lb  d_ub
## 1        y1 -0.07   0.96  0.26   0.90 -2.04 126 0.35 -0.36 -0.71 -0.01
## 2        y2  0.05   1.02  0.82   0.91 -4.48 126 0.00 -0.79 -1.15 -0.43
## 3        y3  0.07   1.11 -0.07   0.99  0.76 126 1.00  0.13 -0.21  0.48
## 4        y4  0.00   1.04 -0.27   1.05  1.44 126 0.91  0.26 -0.09  0.60
## 5        y5  0.06   0.91 -0.28   1.08  1.93 126 0.39  0.34 -0.01  0.69
## 6        y6  0.05   1.03  0.06   1.09 -0.08 126 1.00 -0.01 -0.36  0.33
## 7        y7 -0.01   0.96 -0.06   1.08  0.30 126 1.00  0.05 -0.29  0.40
## 8        y8 -0.08   0.99 -0.18   0.98  0.56 126 1.00  0.10 -0.25  0.45
## 9        y9  0.37   0.93 -0.18   1.05  3.13 126 0.02  0.55  0.20  0.91
## 10      y10 -0.11   0.85 -0.21   0.94  0.59 126 1.00  0.10 -0.24  0.45``````

But note that the width of the confidence intervals are not adjusted here.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.