**denis haine » R**, and kindly contributed to R-bloggers)

Next topic on logistic regression: the exact and the conditional logistic regressions.

**Exact logistic regression**

When the dataset is very small or severely unbalanced, maximum likelihood estimates of coefficients may be biased. An alternative is to use exact logistic regression, available in R with the elrm package. Its syntax is based on an events/trials formulation. The dataset has to be collapsed into a data frame with unique combinations of predictors.

Another possibility is to use robust standard errors, and get comparable p-values to those obtained with exact logistic regression.

### exact logistic regression x <- xtabs(~ casecont + interaction(dneo, dclox), data = nocardia) x interaction(dneo, dclox) casecont 0.0 1.0 0.1 1.1 0 20 15 9 10 1 2 44 3 5 > nocardia.coll <- data.frame(dneo = rep(1:0, 2), dclox = rep(1:0, each = 2), + casecont = x[1, ], ntrials = colSums(x)) nocardia.coll dneo dclox casecont ntrials 0.0 1 1 20 22 1.0 0 1 15 59 0.1 1 0 9 12 1.1 0 0 10 15 library(elrm) Le chargement a nécessité le package : coda Le chargement a nécessité le package : lattice mod5 <- elrm(formula = casecont/ntrials ~ dneo, interest = ~dneo, iter = 100000, dataset = nocardia.coll, burnIn = 2000) ### robust SE library(robust) Le chargement a nécessité le package : fit.models Le chargement a nécessité le package : MASS Le chargement a nécessité le package : robustbase Le chargement a nécessité le package : rrcov Le chargement a nécessité le package : pcaPP Le chargement a nécessité le package : mvtnorm Scalable Robust Estimators with High Breakdown Point (version 1.3-02) mod6 <- glmrob(casecont ~ dcpct + dneo + dclox + dneo*dclox, + family = binomial, data = nocardia, method= "Mqle", + control= glmrobMqle.control(tcc=1.2)) > summary(mod6) Call: glmrob(formula = casecont ~ dcpct + dneo + dclox + dneo * dclox, family = binomial, data = nocardia, method = "Mqle", control = glmrobMqle.control(tcc = 1.2)) Coefficients: Estimate Std. Error z-value Pr(>|z|) (Intercept) -4.440253 1.239138 -3.583 0.000339 *** dcpct 0.025947 0.008504 3.051 0.002279 ** dneo 3.604941 1.034714 3.484 0.000494 *** dclox 0.713411 1.193426 0.598 0.549984 dneo:dclox -2.935345 1.367212 -2.147 0.031797 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Robustness weights w.r * w.x: 89 weights are ~= 1. The remaining 19 ones are summarized as Min. 1st Qu. Median Mean 3rd Qu. Max. 0.1484 0.4979 0.6813 0.6558 0.8764 0.9525 Number of observations: 108 Fitted by method ‘Mqle’ (in 5 iterations) (Dispersion parameter for binomial family taken to be 1) No deviance values available Algorithmic parameters: acc tcc 0.0001 1.2000 maxit 50 test.acc "coef"

**Conditional logistic regression**

Matched case-control studies analyzed with unconditional logistic regression model produce estimates of the odds ratios that are the square of their true value. But we can use conditional logistic regression to analyze matched case-control studies and get correct estimates. Instead of estimating a parameter for each matched set, a conditional model conditions the fixed effects out of the estimation. It can be run in R with clogit from the survival package:

temp <- tempfile() > download.file( + "http://ic.upei.ca/ver/sites/ic.upei.ca.ver/files/ver2_data_R.zip", temp) essai de l'URL 'http://ic.upei.ca/ver/sites/ic.upei.ca.ver/files/ver2_data_R.zip' Content type 'application/zip' length 1107873 bytes (1.1 Mb) URL ouverte ================================================== downloaded 1.1 Mb load(unz(temp, "ver2_data_R/sal_outbrk.rdata")) unlink(temp) ### Salmonella outbreak dataset library(survival) mod7 <- clogit(casecontrol ~ slt_a + strata(match_grp), data = sal_outbrk) summary(mod7) Call: coxph(formula = Surv(rep(1, 112L), casecontrol) ~ slt_a + strata(match_grp), data = sal_outbrk, method = "exact") n= 112, number of events= 39 coef exp(coef) se(coef) z Pr(>|z|) slt_a 1.4852 4.4159 0.5181 2.867 0.00415 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 exp(coef) exp(-coef) lower .95 upper .95 slt_a 4.416 0.2265 1.6 12.19 Rsquare= 0.085 (max possible= 0.518 ) Likelihood ratio test= 10 on 1 df, p=0.001568 Wald test = 8.22 on 1 df, p=0.004148 Score (logrank) test = 9.48 on 1 df, p=0.002075

**leave a comment**for the author, please follow the link and comment on their blog:

**denis haine » R**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...