When we teach “
R for statistics” to groups of scientists (who tend to be quite well informed in statistics, and just need a bit of help with R) we take the time to re-work some tests of model quality with the appropriate significance tests. We organize the lesson in terms of a larger and more detailed version of the following list:
- To test the quality of a numeric model to numeric outcome: F-test (as in linear regression).
- To test the quality of a numeric model to a categorical outcome: χ2 or “Chi-squared” test (as in logistic regression).
- To test the association of a categorical predictor to a categorical outcome: many tests including Fisher’s exact test and Barnard’s test.
- To test the quality of a categorical predictor to a numeric outcome: t-Test, ANOVA, and Tukey’s “honest significant difference” test.
The above tests are all in terms of checking model results, so we don’t allow re-scaling of the predictor as part of the test (as we would have in a Pearson correlation test, or an area under the curve test). There are, of course, many alternatives such as Wald’s test- but we try to start with a set of tests that are standard, well known, and well reported by
R. An odd exception has always been the χ2 test, which we will write a bit about in this note.
The χ2 test is a very useful statistical test. In particular, under fairly mild assumptions, it is a usable probability model for the quality of fit of a logistic regression. It is based on a summary statistics called “deviance” (an odd name, but the quantity is strongly related to likelihood and to entropy). And, after a simple transform, it yields a quantity called “pseudo-R2” (see The Simpler Derivation of Logistic Regression) that reads like “fraction of variation explained.” It is great final test for well-tuned models designed to estimate probabilities (just as area under the curve is a good early test as it abstracts out scale and choice of decision threshold).
Yet the χ2 test is under-emphasized and under-implemented in R. Consider the following trivial logistic regression model.
d <- data.frame(x=c(1,2,3,4,5,6,7,7),
model <- glm(y~x,data=d,family=binomial)
## glm(formula = y ~ x, family = binomial, data = d)
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.37180 -1.09714 -0.00811 1.08024 1.42939
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.7455 1.6672 -0.447 0.655
## x 0.1702 0.3429 0.496 0.620
## (Dispersion parameter for binomial family taken to be 1)
## Null deviance: 11.090 on 7 degrees of freedom
## Residual deviance: 10.837 on 6 degrees of freedom
## AIC: 14.837
## Number of Fisher Scoring iterations: 4
R reported the change in deviance (the measure of model quality, also note the AIC or Akaike information criteria is present), but no significance on the overall model fit is reported (n.b. this is different from coefficient significances). The significance is probably not there because
glm() is a fully general generalized linear model fitter (it can fit much more than just logistic regressions) and the error model likely changes as you change the model type (controlled by the
But logistic regression really deserves to be front and center. This is why in Section 7.2 of Practical Data Science with R, Nina Zumel, John Mount, Manning 2014 we take the time to define and show how to calculate the pseudo-R2 and the significance for the reported deviance.
As we observed in “Proofing statistics in papers” having standard tests and standard reporting of tests is a great advantage. In this spirit we add an “APA-like” report of the χ2 significance in our
sigr package. The use is quick and decisive:
Chi-Square Test summary: pseudo-R2=0.023 (χ2(1,N=8)=0.25, p=0.615).
sigr package isn’t up on CRAN, but can be installed using
devtools::install_github('WinVector/sigr'). It includes documentation, and an example vignette.
sigr can format directly to HTML, Latex, Markdown, and Word. It designed to be used with a
knitr workflow, and when used with
knit will automatically select the correct target formatting. Right now it generates reports for only linear and logistic regressions; we will likely fill out with a few more “most ofter used, so should have a nice neat format” tests going forward.