# Veterinary Epidemiologic Research: Count and Rate Data – Poisson Regression and Risk Ratios

May 10, 2013
By

(This article was first published on denis haine » R, and kindly contributed to R-bloggers)

As noted on paragraph 18.4.1 of the book Veterinary Epidemiologic Research, logistic regression is widely used for binary data, with the estimates reported as odds ratios (OR). If it’s appropriate for case-control studies, risk ratios (RR) are preferred for cohort studies as RR provides estimates of probabilities directly. Moreover, it is often forgotten the assumption of rare event rate, and the OR will overestimate the true effect.

One approach to get RR is to fit a generalised linear model (GLM) with a binomial distribution and a log link. But these models may sometimes fail to converge (Zou, 2004). Another option is to use a Poisson regression with no exposure or offset specified (McNutt, 2003). It gives estimates with very little bias but confidence intervals that are too wide. However, using robust standard errors gives correct confidence intervals (Greenland, 2004, Zou, 2004).

We use data on culling of dairy cows to demonstrate this.

temp <- tempfile()
"http://ic.upei.ca/ver/sites/ic.upei.ca.ver/files/ver2_data_R.zip",
temp)

table(culling$culled) 0 1 255 466 ### recoding number of lactation culling$lact <- with(culling, ifelse(lact_c3 > 1, 2, lact_c3))


The logistic regression:

log.reg <- glm(culled ~ lact, family = binomial("logit"), data = culling)
summary(log.reg)

Call:
glm(formula = culled ~ lact, family = binomial("logit"), data = culling)

Deviance Residuals:
Min      1Q  Median      3Q     Max
-1.546  -1.199   0.849   0.849   1.156

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept)   -0.734      0.299   -2.45    0.014 *
lact           0.784      0.171    4.59  4.4e-06 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 936.86  on 720  degrees of freedom
Residual deviance: 915.84  on 719  degrees of freedom
AIC: 919.8

Number of Fisher Scoring iterations: 4

cbind(exp(coef(log.reg)), exp(confint(log.reg)))
Waiting for profiling to be done...
2.5 % 97.5 %
(Intercept) 0.48 0.267  0.864
lact        2.19 1.568  3.065


We could compute by hand the OR and RR:

with(culling, table(culled, lact))
lact
culled   1   2
0  97 158
1 102 364
## OR is 2.19:
(364 / 158) / (102 / 97)
[1] 2.19
## or the ratio between the cross-products
(364 * 97) / (158 * 102)
[1] 2.19
## or with epicalc
library(epicalc)
with(culling, cc(culled, lact, graph = FALSE))

lact
culled    1   2 Total
0      97 158   255
1     102 364   466
Total 199 522   721

OR =  2.19
Exact 95% CI =  1.54, 3.1
Chi-squared = 21.51, 1 d.f., P value = 0
Fisher's exact test (2-sided) P value = 0

# RR is 1.36:
(364 / 522) / (102 / 199)
[1] 1.36


The GLM with binomial distribution and log link:

log.reg2 <- glm(culled ~ lact, family = binomial(link = "log"), data = culling)
summary(log.reg2)

Call:
glm(formula = culled ~ lact, family = binomial(link = "log"),
data = culling)

Deviance Residuals:
Min      1Q  Median      3Q     Max
-1.546  -1.199   0.849   0.849   1.156

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept)  -0.9762     0.1412   -6.91  4.8e-12 ***
lact          0.3078     0.0749    4.11  4.0e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 936.86  on 720  degrees of freedom
Residual deviance: 915.84  on 719  degrees of freedom
AIC: 919.8

Number of Fisher Scoring iterations: 5

cbind(exp(coef(log.reg2)), exp(confint(log.reg2)))
Waiting for profiling to be done...
2.5 % 97.5 %
(Intercept) 0.377  0.28  0.488
lact        1.360  1.18  1.588


The modified Poisson with robust estimator as described by Zou, 2004 (GEE with individual observations treated as a repeated measure):

## create id for each observation
culling$id <- 1:length(culling$herd)
library(geepack)
zou.mod <- geeglm(culled ~ lact, family = poisson(link = "log"), id = id, corstr = "exchangeable", data = culling)
summary(zou.mod)

Call:
geeglm(formula = culled ~ lact, family = poisson(link = "log"),
data = culling, id = id, corstr = "exchangeable")

Coefficients:
Estimate Std.err Wald Pr(>|W|)
(Intercept)  -0.9762  0.1412 47.8  4.8e-12 ***
lact          0.3078  0.0749 16.9  4.0e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Estimated Scale Parameters:
Estimate Std.err
(Intercept)    0.354   0.021

Correlation: Structure = exchangeable  Link = identity

Estimated Correlation Parameters:
Estimate Std.err
alpha        0       0
Number of clusters:   721   Maximum cluster size: 1
## exponentiated coefficients
exp(coef(zou.mod))
(Intercept)        lact
0.377       1.360
## getting confidence intervals
library(doBy)
zou.mod.coefci <- esticon(zou.mod, diag(2))
zou.mod.expci <- exp(cbind(zou.mod.coefci$Estimate, zou.mod.coefci$Lower, zou.mod.coefci$Upper)) rownames(zou.mod.expci) <- names(coef(zou.mod)) colnames(zou.mod.expci) <- c("RR", "Lower RR", "Upper RR") zou.mod.expci RR Lower RR Upper RR (Intercept) 0.377 0.286 0.497 lact 1.360 1.175 1.576  Or the same using glm() and then getting robust standard errors: pois.reg <- glm(culled ~ lact, family = poisson(link = "log"), data = culling) library(sandwich) # to get robust estimators library(lmtest) # to test coefficients coeftest(pois.reg, vcov = sandwich) z test of coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -0.9762 0.1412 -6.91 4.8e-12 *** lact 0.3078 0.0749 4.11 4.0e-05 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 ## RR exp(coef(pois.reg)) (Intercept) lact 0.377 1.360 ## CI 0.3078 + qnorm(0.05 / 2) * 0.0749 # upper 95% CI [1] 0.161 exp(0.3078 + qnorm(0.05 / 2) * 0.0749) [1] 1.17 0.3078 + qnorm(1 - 0.05 / 2) * 0.0749 # lower 95% CI [1] 0.455 exp(0.3078 + qnorm(1 - 0.05 / 2) * 0.0749) [1] 1.58  So no excuse to use odds ratios when risk ratios are more appropriate! Addition: Plotting the Risk Ratios zou.mod2 <- geeglm(culled ~ lact + johnes, family = poisson(link = "log"), id = id, corstr = "exchangeable", data = culling) zou.mod2.coefci <- esticon(zou.mod2, diag(3)) zou.mod2.expci <- exp(cbind(zou.mod2.coefci$Estimate, zou.mod2.coefci$Lower, zou.mod2.coefci$Upper))
rownames(zou.mod2.expci) <- names(coef(zou.mod2))
colnames(zou.mod2.expci) <- c("RR", "LowerCI", "UpperCI")
zou.df <- as.data.frame(zou.mod2.expci)
zou.df\$var <- row.names(zou.df)
library(ggplot2)
ggplot(zou.df[2:3, ], aes(y = RR, x = reorder(var, RR))) +
geom_point() +
geom_pointrange(aes(ymin = LowerCI, ymax = UpperCI)) +
scale_x_discrete(labels = c("Lactation", "Johnes")) +
scale_y_log10(breaks = seq(1, 2, by = 0.1)) +
xlab("") +
geom_hline(yintercept = 1, linetype = "dotdash", colour = "blue") +
coord_flip()


Thanks to Tal Galili for suggesting this addition.