# Veterinary Epidemiologic Research: GLM (part 4) – Exact and Conditional Logistic Regressions

March 22, 2013
By

(This article was first published on denis haine » R, and kindly contributed to R-bloggers)

Next topic on logistic regression: the exact and the conditional logistic regressions.

Exact logistic regression
When the dataset is very small or severely unbalanced, maximum likelihood estimates of coefficients may be biased. An alternative is to use exact logistic regression, available in R with the elrm package. Its syntax is based on an events/trials formulation. The dataset has to be collapsed into a data frame with unique combinations of predictors.
Another possibility is to use robust standard errors, and get comparable p-values to those obtained with exact logistic regression.

```### exact logistic regression
x <- xtabs(~ casecont + interaction(dneo, dclox), data = nocardia)
x
interaction(dneo, dclox)
casecont 0.0 1.0 0.1 1.1
0  20  15   9  10
1   2  44   3   5
> nocardia.coll <- data.frame(dneo = rep(1:0, 2), dclox = rep(1:0, each = 2),
+                    casecont = x[1, ], ntrials = colSums(x))
nocardia.coll
dneo dclox casecont ntrials
0.0    1     1       20      22
1.0    0     1       15      59
0.1    1     0        9      12
1.1    0     0       10      15

library(elrm)
Le chargement a nécessité le package : coda
Le chargement a nécessité le package : lattice
mod5 <- elrm(formula = casecont/ntrials ~ dneo,
interest = ~dneo,
iter = 100000, dataset = nocardia.coll, burnIn = 2000)

### robust SE
library(robust)
Le chargement a nécessité le package : fit.models
Le chargement a nécessité le package : MASS
Le chargement a nécessité le package : robustbase
Le chargement a nécessité le package : rrcov
Le chargement a nécessité le package : pcaPP
Le chargement a nécessité le package : mvtnorm
Scalable Robust Estimators with High Breakdown Point (version 1.3-02)

mod6 <- glmrob(casecont ~ dcpct + dneo + dclox + dneo*dclox,
+                family = binomial, data = nocardia, method= "Mqle",
+                control= glmrobMqle.control(tcc=1.2))
> summary(mod6)

Call:  glmrob(formula = casecont ~ dcpct + dneo + dclox + dneo * dclox,      family = binomial, data = nocardia, method = "Mqle", control = glmrobMqle.control(tcc = 1.2))

Coefficients:
Estimate Std. Error z-value Pr(>|z|)
(Intercept) -4.440253   1.239138  -3.583 0.000339 ***
dcpct        0.025947   0.008504   3.051 0.002279 **
dneo         3.604941   1.034714   3.484 0.000494 ***
dclox        0.713411   1.193426   0.598 0.549984
dneo:dclox  -2.935345   1.367212  -2.147 0.031797 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Robustness weights w.r * w.x:
89 weights are ~= 1. The remaining 19 ones are summarized as
Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
0.1484  0.4979  0.6813  0.6558  0.8764  0.9525

Number of observations: 108
Fitted by method ‘Mqle’  (in 5 iterations)

(Dispersion parameter for binomial family taken to be 1)

No deviance values available
Algorithmic parameters:
acc    tcc
0.0001 1.2000
maxit
50
test.acc
"coef"
```

Conditional logistic regression

Matched case-control studies analyzed with unconditional logistic regression model produce estimates of the odds ratios that are the square of their true value. But we can use conditional logistic regression to analyze matched case-control studies and get correct estimates. Instead of estimating a parameter for each matched set, a conditional model conditions the fixed effects out of the estimation. It can be run in R with clogit from the survival package:

```temp <- tempfile()
+   "http://ic.upei.ca/ver/sites/ic.upei.ca.ver/files/ver2_data_R.zip", temp)
essai de l'URL 'http://ic.upei.ca/ver/sites/ic.upei.ca.ver/files/ver2_data_R.zip'
Content type 'application/zip' length 1107873 bytes (1.1 Mb)
URL ouverte
==================================================

### Salmonella outbreak dataset

library(survival)
mod7 <- clogit(casecontrol ~ slt_a + strata(match_grp), data = sal_outbrk)
summary(mod7)

Call:
coxph(formula = Surv(rep(1, 112L), casecontrol) ~ slt_a + strata(match_grp),
data = sal_outbrk, method = "exact")

n= 112, number of events= 39

coef exp(coef) se(coef)     z Pr(>|z|)
slt_a 1.4852    4.4159   0.5181 2.867  0.00415 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

exp(coef) exp(-coef) lower .95 upper .95
slt_a     4.416     0.2265       1.6     12.19

Rsquare= 0.085   (max possible= 0.518 )
Likelihood ratio test= 10  on 1 df,   p=0.001568
Wald test            = 8.22  on 1 df,   p=0.004148
Score (logrank) test = 9.48  on 1 df,   p=0.002075
```

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...