Are you lazy? No worries, tadaatoolbox got your back.

[This article was first published on rstats – Tadaa, Data!, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A while ago, we started developing the tadaatoolbox R package.
The goal is simple: There are certain things we tend to always do one after another, like performing effect size calculations after a t-Test.
The convenience tadaatoolbox aims to provide is exactly this: Do the usual stuff and leave me alone.

As an example, take one of the first functions I wrote for the package, tadaa_t.test:

tadaa_t.test(data = ngo, response = stunzahl, group = geschl)
Männlich Weiblich t p df conf_low conf_high method alternative d power
33.616 33.664 -0.108 0.91 248 -0.92 0.824 Two Sample t-test two.sided 0.014 0.051

What happened here?
Let’s take a look step by step:

  1. We took the values you provided: A dataset (the infamous ngo data), the response or dependent variable and the group or independent variable
  2. We performed a regular ol’ t-Test via the common R function t.test
  3. We calculated the effect size using an internal function that’s also available in the package, see ?effect_size_t
  4. We calculated the power of the test via the pwr package
  5. We tidied it up a bit using the pixiedust package (no, seriously) to make everything a little nicer
  6. And finally, we returned a neat table to the console.

Notable bonus features:

  • Remember how we didn’t bother to check for heteroskedasticity / homogenity of variance? That’s because the function does that under the hood and uses the appropriate setting for var.equal. MIND = BLOWN
  • The print method is customizable, and if you use the function in an RMarkdown document, you can specify print = "markdown" to return a markdown table so knitr can render it to a neat table, just like in this blogpost
  • The power calculation notices which type of t-Test is called and calculates power for the specific test
  • The effect size is also aware of the test type, and calculated via the bonus feature function effect_size_t

Pretty neat, hm? Yeah.

Next up in the convenience department we have our old friend, the ANOVA.
We’re not digging too deep into the post-hoc area as we did with the t-Test, and we also don’t bother testing for the prerequisites, but we do at least give you effect sizes.

tadaa_aov(stunzahl ~ geschl, data = ngo)
term df sumsq meansq F p.value part.eta.sq cohens.f
geschl 1 0.144 0.144 0.012 0.91 0 0.007
Residuals 248 3037.456 12.248 NA NA NA NA

Or for two predictors:

tadaa_aov(stunzahl ~ geschl * jahrgang, data = ngo)
term df sumsq meansq F p.value part.eta.sq cohens.f
geschl 1 0.144 0.144 0.015 0.9 0 0.008
jahrgang 2 536.28 268.14 27.203 < 0.001 0.182 0.472
geschl:jahrgang 2 96.056 48.028 4.872 < 0.01 0.038 0.2
Residuals 244 2405.12 9.857 NA NA NA NA

Notice how we give you both the partial eta^2 and Cohen’s f. The latter is used for power calculations in software like G*Power as well as the pwr package in R, while the former is generally used as an interpretable effect size, at least according to my stats class.

And lastly, we give you a simple template to create interaction plots with tadaa_int.
Building your own interaction plots with ggplot2 is kind of annoying, since you have to group/summarize your data beforehand and then write two relatively complex ggplots. tadaa_int does the work for you, and if you choose grid = FALSE, it returns a list of two ggplot2 objects which you can save and modify as you wish with custom scale_* or theme components. If you choose grid = TRUE, the plots are arranged horizontally and printed as one image, which should probably be sufficient for most use case, especially in interactive use for explorative purposes.

tadaa_int(data = ngo, response = stunzahl, group1 = jahrgang, group2 = geschl, grid = TRUE)

I’m considering exposing more arguments to the user, e.g. the arrangement (horizontal vs. vertical), or the shape of the geom_point used for the response means, but if you’re into that much customization, you’re probably more than comfortable with building the plot yourself anyway.

An additional plotting bonus is Tobi’s tadaa_heatmap, a simple template for heatmaps:

tadaa_heatmap(ngo, stunzahl, leistung, jahrgang)

Lazy wrappers

In the “minor conveniences” department, we have a bunch of wrappers for common statistics. The statistics themselves are usually calculated by base R or the packages vcd or ryouready, but they’re tweaked so they’re comfortable for use with dplyr and other tidy data functions in that they only ever return a single (usually numeric) value, which makes it easy to use them in summarize or mutate.

The functions are listed below:

  • modus: A simple function to extract the mode of a frequency table.
  • This is will return a character string denoting multiple values, if applicable!
  • nom_chisqu: Simple wrapper for chisq.test that produces a single value.
  • nom_phi: Simple wrapper for vcd::assocstats to extract phi.
  • nom_v: Simple wrapper for vcd::assocstats to extract Cramer’s V.
  • nom_c: Simple wrapper for vcd::assocstats to extract the contingency coefficient c.
  • nom_lambda: Simple wrapper for ryouready::nom.lambda to extract appropriate lambda.
  • ord_gamma: Simple wrapper for ryouready::ord.gamma.
  • ord_somers_d: Simple wrapper for ryouready::ord.somers.d.

A side effect of having written all these wrappers is that we can now also provide easy functions to calculate all the stats relevant for a specific scale (nominal & ordinal):

tadaa_nom(ngo$abschalt, ngo$geschl)
Chi^2 Cramer’s V c Lambda (x dep.) Lambda (y dep.) Lambda (sym.)
5.35 0.15 0.15 0.03 0.15 0.09
tadaa_ord(ngo$abschalt, ngo$geschl)
Gamma Somers’ D (x dep.) Somers’ D (y dep.) Somers’ D (sym.)
-0.29 -0.15 -0.15 -0.15

Like previous tadaa_*-functions, these take a print argument so you can easily include them in RMarkdown documents by setting print = "markdown".
Please note that I’m aware it’s suboptimal to just calculate all the stats, presumably to pick and choose which fits your needs best, but keep in mind that the intention of this package is to make teaching easier and provide convenient tools to communicate stats, so yes, if you’re currently working on a real science thing, this is all just fun and games.

It’s the little things

And at last, there’s a couple little functions I wrote primarily because I found myself writing the same few lines multiple times and thought “there should be a easier way to do this”… which is, coincidentally, pretty much the story behind everything in this package. Well.

  • generate_recodes: To produce recode assignments for car::recode for evenly sequenced clusters.
  • interval_labels: To produce labels for clusters created by cut.
  • tadaa_likertize: Reduce a range of values to n classes (methodologically wonky).
  • delet_na: Customizable way to drop NA observations from a dataset.
  • labels_to_factor: If you mix and match sjPlot, haven and ggplot2, you might need to translate labels to factors, which is precisely what this functions does. Drop in data.frame with label, receive data.frame with factors.
  • drop_labels: If you subset a labelled dataset, you might end up with labels that have no values with them. This function will drop the now unused labels.
  • pval_string: Shamalessly adapted from pixiedust::pvalString, this will format a p-value as a character string in common p < 0.001 notation and so on. The difference from the pixiedust version is that this function will also print p < 0.05.

Also, since I really like the rmdformats::readthedown RMarkdown template, I made a few tweaks to a ggplot2 theme to match the template, you can use it by adding + theme_readthedown() to your ggplots.
It’s a little brighter and let’s you choose which axis (x, y, both) to emphasize visually.

tadaa_int(ngo, stunzahl, jahrgang, geschl, grid = F)[[1]] +
theme_readthedown(axis_emph = "y")

For everything I missed, there’s our vignette.

Conclusion

This is it. The upcoming version (0.10) is going to be ready for CRAN soon, while 0.9 is already available.
Try it and submit issues and feature requests as much as you want.
The next neat feature is probably going to be a tadaa_normtest function that gives you an easy way to perform tests for normality over subgroups.

¯\_(ツ)_/¯

Update 2016-08-19 12:51

As of last night, v0.10.0 is live on CRAN, and it brought the promised tadaa_normtest with options for our favorite tests for normality: Anderson-Darling, Shapiro-Wilk, Pearson’s χ² and even that Kolmogorov-Smirnov one you shouldn’t really use.
See the full release notes on GitHub.

To leave a comment for the author, please follow the link and comment on their blog: rstats – Tadaa, Data!.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)