Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In logistic regression, when the outcome has low (or high) prevalence, or when there are several interacted categorical predictors, it can happen that for some combination of the predictors, all the observations have the same event status. A similar event occurs when continuous covariates predict the outcome too perfectly.

This phenomenon, known as “separation” (including complete and quasi-complete separation) will cause problems fitting the model. Sometimes the only symptom of separation will be extremely large standard errors, while at other times the software may report an error or a warning.

One approach to handling this sort of problem is exact logistic regression, which we discuss in section 4.1.2. But exact logistic regression is complex and may require prohibitive computational resources. Another option is to use a Bayesian approach. Here we show how to use a penalized likelihood method originally proposed by Firth (1993 Biometrika 80:27-38) and described fully in this setting by Georg Heinze (2002 Statistics in Medicine 21:2409-2419 and 2006 25:4216-4226). A nice summary of the method is shown on a web page that Heinze maintains. In later entries we’ll consider the Bayesian and exact approaches.

SAS
In SAS, the corrected estimates can be found using the firth option to the model statement in proc logistic. We’ll set up the problem in the simple setting of a 2×2 table with an empty cell. Here, we simply output three observations with three combinations of predictor and outcome, along with a weight variable which contains the case counts in each cell of the table

`data testfirth;   pred=1; outcome=1; weight=20; output;   pred=0; outcome=1; weight=20; output;   pred=0; outcome=0; weight=200; output;run;`

In the proc logistic code, we use the weight statement, available in many procedures, to suggest how many times each observation is to be replicated before the analysis. This approach can save a lot of space.

`proc logistic data = testfirth;  class outcome pred (param=ref ref='0');  model outcome(event='1') = pred / cl firth;  weight weight;run;`

Without the firth option, the parameter estimate is 19.7 with a standard error of 1349. In contrast, here is the result of the above code.

`            Analysis of Maximum Likelihood Estimates                              Standard         WaldParameter     DF   Estimate      Error   Chi-Square   Pr > ChiSqIntercept      1    -2.2804     0.2324      96.2774       <.0001pred      1    1     5.9939     1.4850      16.2926       <.0001`

Note here that these no suggestion in this part of the output that the Firth method was employed. That appears only at the very top of the voluminous output.

R
In R, we can use Heinze’s logistf package, which includes the logistf() function. We’ll make the same table as in SAS by constructing two vectors of length 240 using the c() and rep() functions.

`pred = c(rep(1,20),rep(0,220))outcome = c(rep(1,40),rep(0,200))lr1 = glm(outcome ~ pred, binomial)>summary(lr1)Call:glm(formula = outcome ~ pred, family = binomial)Deviance Residuals:     Min       1Q   Median       3Q      Max  -0.4366  -0.4366  -0.4366  -0.4366   2.1899  Coefficients:             Estimate Std. Error z value Pr(>|z|)    (Intercept)   -2.3026     0.2345  -9.818   <2e-16 ***pred          20.8687  1458.5064   0.014    0.989    ---Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for binomial family taken to be 1)    Null deviance: 216.27  on 239  degrees of freedomResidual deviance: 134.04  on 238  degrees of freedomAIC: 138.04Number of Fisher Scoring iterations: 17`

Note that the estimate differs slightly from what SAS reports. Here’s a more plausible answer.

`>library(logistf)>lr2 = logistf(outcome ~ pred)>summary(lr2)logistf(formula = outcome ~ pred)Model fitted by Penalized MLConfidence intervals and p-values by Profile Likelihood                  coef  se(coef) lower 0.95 upper 0.95 Chisq p(Intercept) -2.280389 0.2324057  -2.765427  -1.851695   Inf 0pred         5.993961 1.4850029   3.947048  10.852893   Inf 0Likelihood ratio test=78.95473 on 1 df, p=0, n=240Wald test = 16.29196 on 1 df, p = 5.429388e-05Covariance-Matrix:            [,1]        [,2][1,]  0.05401242 -0.05401242[2,] -0.05401242  2.20523358`

Here, the estimates are nearly identical to SAS, but the standard errors differ. 