Correctly Reporting P-Values in Summary Tables Reported with xtable

[This article was first published on R – Bearded Analytics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Often when writing a manuscript in using knitr and xtable I am flustered by my p-values. In simple summary tables, R conveniently rounds my p-values to be 0: a mathematically inappropriate task. A colleague recently commented on the poor reporting of my table (shown below using print.xtable with the type=”html” argument), inspiring a much needed change.
Estimate Std.err Wald Pr(>|W|)
(Intercept) 0.001704 0.000005 100409.770956 0.000000
sizemedium 0.000046 0.000005 90.534705 0.000000
sizesmall 0.000003 0.000005 0.294331 0.587458
time -0.000004 0.000001 11.614917 0.000654
  The fix is actually fairly straight forward, and can be summarized in a simple function: “fixp”, with the code shown below:
fixp <- function(x, dig=3){
  x <- as.data.frame(x)
  
  if(substr(names(x)[ncol(x)],1,2) != "Pr")
    warning("The name of the last column didn't start with Pr. This may indicate that p-values weren't in the last row, and thus, that this function is inappropriate.")
  x[,ncol(x)] <- round(x[,ncol(x)], dig)
  for(i in 1:nrow(x)){
    if(x[i,ncol(x)] == 0)
      x[i,ncol(x)] <- paste0("< .", paste0(rep(0,dig-1), collapse=""), "1")
  }
  
  x
}
All that's going on: the function is pulling in the summary table (usually through a $coef), trying to turn it into a dataframe (some already are, though some tables are numeric (e.g. lm)), throwing a warning if the last heading doesn't begin with "Pr" (as it may not be the column that contains p-values), and editing any values that were rounded to 0 (at the user specified rounding point) to be < the smallest number that could be rounded to (e.g. <.01). Then we output the edited table, all ready for reporting! To mimic what was above, we set our digits to be equal to 6 (so go out 6 decimal places for the p-value), and re-run:
Estimate Std.err Wald Pr(>|W|)
(Intercept) 0.001704 0.000005 100409.770956 < .000001
sizemedium 0.000046 0.000005 90.534705 < .000001
sizesmall 0.000003 0.000005 0.294331 0.587458
time -0.000004 0.000001 11.614917 0.000654
  Much better! Also, to report a two digit p-value (for some writing styles), we simply set dig = 2:
Estimate Std.err Wald Pr(>|W|)
(Intercept) 0.001704 0.000005 100409.770956 < .01
sizemedium 0.000046 0.000005 90.534705 < .01
sizesmall 0.000003 0.000005 0.294331 0.59
time -0.000004 0.000001 11.614917 < .01
  By design, the p-values can be manipulated independent of the estimates. This allows reporting of the estimated coefficients in meaningful units (in the above example, very small units), while reporting the p-values on a scale that many writing styles request. Want to try this yourself? Here's an example that you can try with just a built in dataset in R:
#this gives a summary table with a small p-value. Trying to report this with xtable would results in an R rounding issue!
(mod <- coef(summary(lm(uptake ~ conc + Treatment + Type + Plant, data=CO2))))

#this fixes the p-value to 2 digits, correctly reporting p-values that would have been rounded to 0
fixp(mod,dig=2)
Here's the final output via print.xtable (dig=2 for fixp and xtable):
Estimate Std. Error t value Pr(>|t|)
(Intercept) 37.42 4.67 8.00 < .01
conc 0.02 0.00 7.96 < .01
Treatmentchilled -12.50 5.10 -2.45 0.02
TypeMississippi -23.33 6.01 -3.88 < .01
Plant.L 21.58 11.14 1.94 0.06
Plant.Q -4.62 2.27 -2.03 0.05
Plant.C 1.46 5.10 0.29 0.78
Plant^4 2.34 2.27 1.03 0.31
Plant^5 -0.48 5.77 -0.08 0.93
Plant^6 -0.04 2.27 -0.02 0.99
Plant^7 -1.91 3.64 -0.53 0.6
Plant^8 -3.28 2.27 -1.44 0.15
Plant^10 0.55 2.27 0.24 0.81
  Limitations (ish):
  1. Again, this assumes that the last column is the one to be transformed. This is by design, though may be inconvenient in some situations. If needed, the change is easily made through the definition of the function.
  2. When the last column is manipulated, it becomes a character column in the dataframe. Alternatively, when it is rounded but no entry rounds to 0, it is numeric.
  3. This assumes a dataframe-style format of your table. Thus, this method will NOT be effective at correcting reported p-values for an individual test: say a t-test, where only the statistic is reported (and not a table). Personally this is not a concern, as I deal with these situations in other ways, but for some users seeking an overall "p-value fixing" method this may not be the answer.
As with other functions I write posts on, this function is available in my package (creatively named "myStuff")  via Github. If you'd like to play with the most current version of the function, I'd encourage you to check it out here. Alternatively, to have access to other fun functions, install the package directly from GitHub with the code below (requires devtools):
devtools:install_github("flor3652/myStuff")

To leave a comment for the author, please follow the link and comment on their blog: R – Bearded Analytics.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)