Veterinary Epidemiologic Research: GLM (part 4) – Exact and Conditional Logistic Regressions
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Next topic on logistic regression: the exact and the conditional logistic regressions.
Exact logistic regression
When the dataset is very small or severely unbalanced, maximum likelihood estimates of coefficients may be biased. An alternative is to use exact logistic regression, available in R with the elrm package. Its syntax is based on an events/trials formulation. The dataset has to be collapsed into a data frame with unique combinations of predictors.
Another possibility is to use robust standard errors, and get comparable p-values to those obtained with exact logistic regression.
### exact logistic regression
x <- xtabs(~ casecont + interaction(dneo, dclox), data = nocardia)
x
interaction(dneo, dclox)
casecont 0.0 1.0 0.1 1.1
0 20 15 9 10
1 2 44 3 5
> nocardia.coll <- data.frame(dneo = rep(1:0, 2), dclox = rep(1:0, each = 2),
+ casecont = x[1, ], ntrials = colSums(x))
nocardia.coll
dneo dclox casecont ntrials
0.0 1 1 20 22
1.0 0 1 15 59
0.1 1 0 9 12
1.1 0 0 10 15
library(elrm)
Le chargement a nécessité le package : coda
Le chargement a nécessité le package : lattice
mod5 <- elrm(formula = casecont/ntrials ~ dneo,
interest = ~dneo,
iter = 100000, dataset = nocardia.coll, burnIn = 2000)
### robust SE
library(robust)
Le chargement a nécessité le package : fit.models
Le chargement a nécessité le package : MASS
Le chargement a nécessité le package : robustbase
Le chargement a nécessité le package : rrcov
Le chargement a nécessité le package : pcaPP
Le chargement a nécessité le package : mvtnorm
Scalable Robust Estimators with High Breakdown Point (version 1.3-02)
mod6 <- glmrob(casecont ~ dcpct + dneo + dclox + dneo*dclox,
+ family = binomial, data = nocardia, method= "Mqle",
+ control= glmrobMqle.control(tcc=1.2))
> summary(mod6)
Call: glmrob(formula = casecont ~ dcpct + dneo + dclox + dneo * dclox, family = binomial, data = nocardia, method = "Mqle", control = glmrobMqle.control(tcc = 1.2))
Coefficients:
Estimate Std. Error z-value Pr(>|z|)
(Intercept) -4.440253 1.239138 -3.583 0.000339 ***
dcpct 0.025947 0.008504 3.051 0.002279 **
dneo 3.604941 1.034714 3.484 0.000494 ***
dclox 0.713411 1.193426 0.598 0.549984
dneo:dclox -2.935345 1.367212 -2.147 0.031797 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Robustness weights w.r * w.x:
89 weights are ~= 1. The remaining 19 ones are summarized as
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.1484 0.4979 0.6813 0.6558 0.8764 0.9525
Number of observations: 108
Fitted by method ‘Mqle’ (in 5 iterations)
(Dispersion parameter for binomial family taken to be 1)
No deviance values available
Algorithmic parameters:
acc tcc
0.0001 1.2000
maxit
50
test.acc
"coef"
Conditional logistic regression
Matched case-control studies analyzed with unconditional logistic regression model produce estimates of the odds ratios that are the square of their true value. But we can use conditional logistic regression to analyze matched case-control studies and get correct estimates. Instead of estimating a parameter for each matched set, a conditional model conditions the fixed effects out of the estimation. It can be run in R with clogit from the survival package:
temp <- tempfile()
> download.file(
+ "http://ic.upei.ca/ver/sites/ic.upei.ca.ver/files/ver2_data_R.zip", temp)
essai de l'URL 'http://ic.upei.ca/ver/sites/ic.upei.ca.ver/files/ver2_data_R.zip'
Content type 'application/zip' length 1107873 bytes (1.1 Mb)
URL ouverte
==================================================
downloaded 1.1 Mb
load(unz(temp, "ver2_data_R/sal_outbrk.rdata"))
unlink(temp)
### Salmonella outbreak dataset
library(survival)
mod7 <- clogit(casecontrol ~ slt_a + strata(match_grp), data = sal_outbrk)
summary(mod7)
Call:
coxph(formula = Surv(rep(1, 112L), casecontrol) ~ slt_a + strata(match_grp),
data = sal_outbrk, method = "exact")
n= 112, number of events= 39
coef exp(coef) se(coef) z Pr(>|z|)
slt_a 1.4852 4.4159 0.5181 2.867 0.00415 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
exp(coef) exp(-coef) lower .95 upper .95
slt_a 4.416 0.2265 1.6 12.19
Rsquare= 0.085 (max possible= 0.518 )
Likelihood ratio test= 10 on 1 df, p=0.001568
Wald test = 8.22 on 1 df, p=0.004148
Score (logrank) test = 9.48 on 1 df, p=0.002075
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.