Negative Binomial Regression for Complex Samples (Surveys) #rstats

[This article was first published on R – Strenge Jacke!, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The survey-package from Thomas Lumley is a great toolkit when analyzing complex samples. It provides svyglm(), to fit generalised linear models to data from a complex survey design. svyglm() covers all families that are also provided by R’s glm() – however, the survey-package has no function to fit negative binomial models, which might be useful for overdispersed count models. Yet, the package provides a generic svymle() to fit user-specified likelihood estimations. In his book, Appendix E, Thomas Lumley describes how to write your own likelihood-function, passed to svymle(), to fit negative binomial models for complex samples. So I wrote a small „wrapper“ and implemented a function svyglm.nb() in my sjstats-package.

# ------------------------------------------
# This example reproduces the results from
# Lumley 2010, figure E.7 (Appendix E, p256)
# ------------------------------------------

# create survey design
des <- svydesign(
  id = ~ SDMVPSU,
  strat = ~ SDMVSTRA,
  weights = ~ WTINT2YR,
  nest = TRUE,
  data = nhanes_sample

# fit negative binomial regression
fit <- svyglm.nb(total ~ factor(RIAGENDR) * (log(age) + factor(RIDRETH1)), des)

# print coefficients and standard errors
round(cbind(coef(fit), survey::SE(fit)), 2)

#>                                          [,1] [,2]
#> theta.(Intercept)                        0.81 0.05
#> eta.(Intercept)                          2.29 0.16
#> eta.factor(RIAGENDR)2                   -0.80 0.18
#> eta.log(age)                             1.07 0.23
#> eta.factor(RIDRETH1)2                    0.08 0.15
#> eta.factor(RIDRETH1)3                    0.09 0.18
#> eta.factor(RIDRETH1)4                    0.82 0.30
#> eta.factor(RIDRETH1)5                    0.06 0.38
#> eta.factor(RIAGENDR)2:log(age)          -1.22 0.27
#> eta.factor(RIAGENDR)2:factor(RIDRETH1)2 -0.18 0.26
#> eta.factor(RIAGENDR)2:factor(RIDRETH1)3  0.60 0.19
#> eta.factor(RIAGENDR)2:factor(RIDRETH1)4  0.06 0.37
#> eta.factor(RIAGENDR)2:factor(RIDRETH1)5  0.38 0.44

The functions returns an object of class svymle, so all methods provided by the survey-package for this class work – it’s just that there are only a few, and common methods like predict() are currently not implemented. Maybe, hopefully, future updates of the survey-package will include such features.

Tagged: R, rstats, Statistik

To leave a comment for the author, please follow the link and comment on their blog: R – Strenge Jacke!. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)