A Lazy Function

[This article was first published on CillianMacAodh, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

It has been quite a while since I posted, but I haven’t been idle, I completed my PhD since the last post, and I’m due to graduate next Thursday. I am also delighted to have recently been added to R-bloggers.com so I’m keen to get back into it.

A Lazy Function

I have already written 2 posts about writing functions, and I will try to diversify my content. That said, I won’t refrain from sharing something that has been helpful to me. The function(s) I describe in this post is an artefact left over from before I started using R Markdown. It is a product of its time but may still be of use to people who haven’t switched to R Markdown yet. It is lazy (and quite imperfect) solution to a tedious task.

The Problem

At the time I wrote this function I was using R for my statistics and Libreoffice for writing. I would run a test in R and then write it up in Libreoffice. Each value that needed reporting had to be transferred from my R output to Libreoffice – and for each test there are a number of values that need reporting. Writing up these tests is pretty formulaic. There’s a set structure to the sentence, for example writing up a t-test with a significant result nearly always looks something like this: An independent samples t-test revealed a significant difference in X between the Y sample, (M = [ ], SD = [ ]), and the Z sample, (M = [ ], SD = [ ]), t([df]) = [ ], p = [ ]. And the write up of a non-significant result looks something like this: An independent samples t-test revealed no significant difference in X between the Y sample, (M = [ ], SD = [ ]), and the Z sample, (M = [ ], SD = [ ]), t([df]) = [ ], p = [ ]. Seven values (the square [ ] brackets) need to be reported for this single test. Whether you copy and paste or type each value, the reporting of such tests can be very tedious, and leave you prone to errors in reporting.

The Solution

In order to make reporting values easier (and more accurate) I wrote the t_paragraph() function (and the related t_paired_paragraph() function). This provided an output that I could copy and paste into a Word (Libreoffice) document. This function is part of the desnum1 package (McHugh, 2017).

The t_parapgraph() Function

The t_parapgraph() function runs a t-test and generates an output that can be copied and pasted into a word document. The code for the function is as follows:
# Create the function t_paragraph with arguments x, y, and measure
# x is the dependent variable
# y is the independent (grouping) variable
# measure is the name of dependent variable inputted as string

t_paragraph <- function (x, y, measure){
  
  # Run a t-test and store it as an object t
  
  t <- t.test(x ~ y)
  
  
  # If your grouping variable has labelled levels, the next line will store them for reporting at a later stage
  
  labels <- levels(y)
  
  # Create an object for each value to be reported
  
  tsl <- as.vector(t$statistic)
  ts <- round(tsl, digits = 3)
  tpl <- as.vector(t$p.value)
  tp <- round(tpl, digits = 3)
  d_fl <- as.vector(t$parameter)
  d_f <- round(d_fl, digits = 2)
  ml <- as.vector(tapply(x, y, mean))
  m <- round(ml, digits = 2)
  sdl <- as.vector(tapply(x, y, sd))
  sd <- round(sdl, digits = 2)
  
  # Use print(paste0()) to combine the objects above and create two potential outputs
  # The output that is generated will depend on the result of the test
  
  
  # wording if significant difference is observed
  
  if (tp < 0.05) 
    print(paste0("An independent samples t-test revealed a significant difference in ", 
                 measure, " between the ", labels[1], " sample, (M = ", 
                 m[1], ", SD = ", sd[1], "), and the ", labels[2], 
                 " sample, (M =", m[2], ", SD =", sd[2], "), t(", 
                 d_f, ") = ", ts, ", p = ", tp, "."), quote = FALSE, 
          digits = 2)
  
  # wording if no significant difference is observed      
  
  if (tp > 0.05) 
    print(paste0("An independent samples t-test revealed no difference in ", 
                 measure, " between the ", labels[1], " sample, (M = ", 
                 m[1], ", SD = ", sd[1], "), and the ", labels[2], 
                 " sample, (M = ", m[2], ", SD =", sd[2], "), t(", 
                 d_f, ") = ", ts, ", p = ", tp, "."), quote = FALSE, 
          digits = 2)
}
When using t_paragraph()x is your DV, y is your grouping variable while measure is a string value that the name of the dependent variable. To illustrate the function I’ll use the mtcars dataset.

Applications of the t_parapgraph() Function

The mtcars dataset is comes with R. For information on it simply type help(mtcars). The variables of interest here are am(transmission; 0 = automatic, 1 = manual), mpg (miles per gallon), qsec (1/4 mile time). The two questions I’m going to look at are:
  1. Is there a difference in miles per gallon depending on transmission?
  2. Is there a difference in 1/4 mile time depending on transmission?
Before running the test it is a good idea to look at the data2. Because we’re going to look at differences between groups we want to run descriptives for each group separately. To do this I’m going to combine the the descriptives() function which I previously covered here (also part of the desnum package) and the tapply() function. The tapply() function allows you to run a function on subsets of a dataset using a grouping variable (or index). The arguments are as follows tapply(vector, index, function)vector is the variable you want to pass through function; and index is the grouping variable. The examples below will make this clearer. We want to run descriptives on mtcars$mpg and on mtcars$qsec and for each we want to group by transmission (mtcars$am). This can be done using tapply() and descriptives() together as follows:
tapply(mtcars$mpg, mtcars$am, descriptives)
## $`0`
##       mean       sd  min  max len
## 1 17.14737 3.833966 10.4 24.4  19
## 
## $`1`
##       mean       sd min  max len
## 1 24.39231 6.166504  15 33.9  13
Recall that 0 = automatic, and 1 = manual. Replace mpg with qsec and run again:
tapply(mtcars$qsec, mtcars$am, descriptives)
## $`0`
##       mean       sd   min  max len
## 1 18.18316 1.751308 15.41 22.9  19
## 
## $`1`
##    mean       sd  min  max len
## 1 17.36 1.792359 14.5 19.9  13

Running t_paragraph()

Now that we know the values for automatic vs manual cars we can run our t-tests using t_paragraph(). Our first question:
Is there a difference in miles per gallon depeding on transmission?
t_paragraph(mtcars$mpg, mtcars$am, "miles per gallon")
## [1] An independent samples t-test revealed a significant difference in miles per gallon between the  sample, (M = 17.15, SD = 3.83), and the  sample, (M =24.39, SD =6.17), t(18.33) = -3.767, p = 0.001.
There is a difference, and the output above can be copied and pasted into a word document with minimal changes required. Our second question was:
Is there a difference in 1/4 mile time depending on transmission?
t_paragraph(mtcars$qsec, mtcars$am, "quarter-mile time")
## [1] An independent samples t-test revealed no difference in quarter-mile time between the  sample, (M = 18.18, SD = 1.75), and the  sample, (M = 17.36, SD =1.79), t(25.53) = 1.288, p = 0.209.
This time there was no significant difference, and again the output can be copied and pasted into word with minimal changes.

Limitations

The function described was written a long time ago, and could be updated. However I no longer copy and paste into word (having switched to R markdown instead). The reporting of the p value is not always to APA standards. If p is < .001 this is what should be reported. The code for t_paragraph() could be updated to include the p_report function (described here) which would address this. Another limitation is that the formatting of the text isn’t perfect, the letters (N,M,SD,t,p) should all be italicised, but having to manually fix this formatting is still easier than manually transferring individual values.

Conclusion

Despite the limitations the functions t_paragraph() and t_paired_paragraph()3 have made my life easier. I still use them occasionally. I hope they can be of use to anyone who is using R but has not switched to R Markdown yet.

References

McHugh, C. (2017). Desnum: Creates some useful functions.

  1. To install desnum just run devtools::install_github("cillianmiltown/R_desnum")
  2. In this case this is particularly useful because there are no value labels for mtcars$am, so it won’t be clear from the output which values refer to the automatic group and which refer to the manual group. Running descriptives will help with this.
  3. If you want to see the code for t_paired_paragraph() just load desnum and run t_paired_paragraph (without parenthesis)

To leave a comment for the author, please follow the link and comment on their blog: CillianMacAodh.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)