Veterinary Epidemiologic Research: GLM (part 4) – Exact and Conditional Logistic Regressions

[This article was first published on denis haine » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Next topic on logistic regression: the exact and the conditional logistic regressions.

Exact logistic regression
When the dataset is very small or severely unbalanced, maximum likelihood estimates of coefficients may be biased. An alternative is to use exact logistic regression, available in R with the elrm package. Its syntax is based on an events/trials formulation. The dataset has to be collapsed into a data frame with unique combinations of predictors.
Another possibility is to use robust standard errors, and get comparable p-values to those obtained with exact logistic regression.

### exact logistic regression
x <- xtabs(~ casecont + interaction(dneo, dclox), data = nocardia)
x
        interaction(dneo, dclox)
casecont 0.0 1.0 0.1 1.1
       0  20  15   9  10
       1   2  44   3   5
> nocardia.coll <- data.frame(dneo = rep(1:0, 2), dclox = rep(1:0, each = 2),
+                    casecont = x[1, ], ntrials = colSums(x))
nocardia.coll
    dneo dclox casecont ntrials
0.0    1     1       20      22
1.0    0     1       15      59
0.1    1     0        9      12
1.1    0     0       10      15

library(elrm)
Le chargement a nécessité le package : coda
Le chargement a nécessité le package : lattice
mod5 <- elrm(formula = casecont/ntrials ~ dneo,
             interest = ~dneo,
             iter = 100000, dataset = nocardia.coll, burnIn = 2000)

### robust SE
library(robust)
Le chargement a nécessité le package : fit.models
Le chargement a nécessité le package : MASS
Le chargement a nécessité le package : robustbase
Le chargement a nécessité le package : rrcov
Le chargement a nécessité le package : pcaPP
Le chargement a nécessité le package : mvtnorm
Scalable Robust Estimators with High Breakdown Point (version 1.3-02)

mod6 <- glmrob(casecont ~ dcpct + dneo + dclox + dneo*dclox,
+                family = binomial, data = nocardia, method= "Mqle",
+                control= glmrobMqle.control(tcc=1.2))
> summary(mod6)

Call:  glmrob(formula = casecont ~ dcpct + dneo + dclox + dneo * dclox,      family = binomial, data = nocardia, method = "Mqle", control = glmrobMqle.control(tcc = 1.2)) 


Coefficients:
             Estimate Std. Error z-value Pr(>|z|)    
(Intercept) -4.440253   1.239138  -3.583 0.000339 ***
dcpct        0.025947   0.008504   3.051 0.002279 ** 
dneo         3.604941   1.034714   3.484 0.000494 ***
dclox        0.713411   1.193426   0.598 0.549984    
dneo:dclox  -2.935345   1.367212  -2.147 0.031797 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
Robustness weights w.r * w.x: 
 89 weights are ~= 1. The remaining 19 ones are summarized as
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.1484  0.4979  0.6813  0.6558  0.8764  0.9525 

Number of observations: 108 
Fitted by method ‘Mqle’  (in 5 iterations)

(Dispersion parameter for binomial family taken to be 1)

No deviance values available 
Algorithmic parameters: 
   acc    tcc 
0.0001 1.2000 
maxit 
   50 
test.acc 
  "coef" 

Conditional logistic regression

Matched case-control studies analyzed with unconditional logistic regression model produce estimates of the odds ratios that are the square of their true value. But we can use conditional logistic regression to analyze matched case-control studies and get correct estimates. Instead of estimating a parameter for each matched set, a conditional model conditions the fixed effects out of the estimation. It can be run in R with clogit from the survival package:

temp <- tempfile()
> download.file(
+   "http://ic.upei.ca/ver/sites/ic.upei.ca.ver/files/ver2_data_R.zip", temp)
essai de l'URL 'http://ic.upei.ca/ver/sites/ic.upei.ca.ver/files/ver2_data_R.zip'
Content type 'application/zip' length 1107873 bytes (1.1 Mb)
URL ouverte
==================================================
downloaded 1.1 Mb

load(unz(temp, "ver2_data_R/sal_outbrk.rdata"))
unlink(temp)
### Salmonella outbreak dataset

library(survival)
mod7 <- clogit(casecontrol ~ slt_a + strata(match_grp), data = sal_outbrk)
summary(mod7)

Call:
coxph(formula = Surv(rep(1, 112L), casecontrol) ~ slt_a + strata(match_grp), 
    data = sal_outbrk, method = "exact")

  n= 112, number of events= 39 

        coef exp(coef) se(coef)     z Pr(>|z|)   
slt_a 1.4852    4.4159   0.5181 2.867  0.00415 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

      exp(coef) exp(-coef) lower .95 upper .95
slt_a     4.416     0.2265       1.6     12.19

Rsquare= 0.085   (max possible= 0.518 )
Likelihood ratio test= 10  on 1 df,   p=0.001568
Wald test            = 8.22  on 1 df,   p=0.004148
Score (logrank) test = 9.48  on 1 df,   p=0.002075

To leave a comment for the author, please follow the link and comment on their blog: denis haine » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)