Negative Binomial Regression for Complex Samples (Surveys) #rstats

March 9, 2017

(This article was first published on R – Strenge Jacke!, and kindly contributed to R-bloggers)

The survey-package from Thomas Lumley is a great toolkit when analyzing complex samples. It provides svyglm(), to fit generalised linear models to data from a complex survey design. svyglm() covers all families that are also provided by R’s glm() – however, the survey-package has no function to fit negative binomial models, which might be useful for overdispersed count models. Yet, the package provides a generic svymle() to fit user-specified likelihood estimations. In his book, Appendix E, Thomas Lumley describes how to write your own likelihood-function, passed to svymle(), to fit negative binomial models for complex samples. So I wrote a small „wrapper“ and implemented a function svyglm.nb() in my sjstats-package.

# ------------------------------------------
# This example reproduces the results from
# Lumley 2010, figure E.7 (Appendix E, p256)
# ------------------------------------------

# create survey design
des <- svydesign(
  id = ~ SDMVPSU,
  strat = ~ SDMVSTRA,
  weights = ~ WTINT2YR,
  nest = TRUE,
  data = nhanes_sample

# fit negative binomial regression
fit <- svyglm.nb(total ~ factor(RIAGENDR) * (log(age) + factor(RIDRETH1)), des)

# print coefficients and standard errors
round(cbind(coef(fit), survey::SE(fit)), 2)

#>                                          [,1] [,2]
#> theta.(Intercept)                        0.81 0.05
#> eta.(Intercept)                          2.29 0.16
#> eta.factor(RIAGENDR)2                   -0.80 0.18
#> eta.log(age)                             1.07 0.23
#> eta.factor(RIDRETH1)2                    0.08 0.15
#> eta.factor(RIDRETH1)3                    0.09 0.18
#> eta.factor(RIDRETH1)4                    0.82 0.30
#> eta.factor(RIDRETH1)5                    0.06 0.38
#> eta.factor(RIAGENDR)2:log(age)          -1.22 0.27
#> eta.factor(RIAGENDR)2:factor(RIDRETH1)2 -0.18 0.26
#> eta.factor(RIAGENDR)2:factor(RIDRETH1)3  0.60 0.19
#> eta.factor(RIAGENDR)2:factor(RIDRETH1)4  0.06 0.37
#> eta.factor(RIAGENDR)2:factor(RIDRETH1)5  0.38 0.44

The functions returns an object of class svymle, so all methods provided by the survey-package for this class work – it’s just that there are only a few, and common methods like predict() are currently not implemented. Maybe, hopefully, future updates of the survey-package will include such features.

Tagged: R, rstats, Statistik

To leave a comment for the author, please follow the link and comment on their blog: R – Strenge Jacke!. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)