# Negative Binomial Regression for Complex Samples (Surveys) #rstats

March 9, 2017
By

(This article was first published on R – Strenge Jacke!, and kindly contributed to R-bloggers)

The survey-package from Thomas Lumley is a great toolkit when analyzing complex samples. It provides `svyglm()`, to fit generalised linear models to data from a complex survey design. `svyglm()` covers all families that are also provided by R’s `glm()` – however, the survey-package has no function to fit negative binomial models, which might be useful for overdispersed count models. Yet, the package provides a generic `svymle()` to fit user-specified likelihood estimations. In his book, Appendix E, Thomas Lumley describes how to write your own likelihood-function, passed to `svymle()`, to fit negative binomial models for complex samples. So I wrote a small „wrapper“ and implemented a function `svyglm.nb()` in my sjstats-package.

```# ------------------------------------------
# This example reproduces the results from
# Lumley 2010, figure E.7 (Appendix E, p256)
# ------------------------------------------
library(sjstats)
library(survey)
data(nhanes_sample)

# create survey design
des <- svydesign(
id = ~ SDMVPSU,
strat = ~ SDMVSTRA,
weights = ~ WTINT2YR,
nest = TRUE,
data = nhanes_sample
)

# fit negative binomial regression
fit <- svyglm.nb(total ~ factor(RIAGENDR) * (log(age) + factor(RIDRETH1)), des)

# print coefficients and standard errors
round(cbind(coef(fit), survey::SE(fit)), 2)

#>                                          [,1] [,2]
#> theta.(Intercept)                        0.81 0.05
#> eta.(Intercept)                          2.29 0.16
#> eta.factor(RIAGENDR)2                   -0.80 0.18
#> eta.log(age)                             1.07 0.23
#> eta.factor(RIDRETH1)2                    0.08 0.15
#> eta.factor(RIDRETH1)3                    0.09 0.18
#> eta.factor(RIDRETH1)4                    0.82 0.30
#> eta.factor(RIDRETH1)5                    0.06 0.38
#> eta.factor(RIAGENDR)2:log(age)          -1.22 0.27
#> eta.factor(RIAGENDR)2:factor(RIDRETH1)2 -0.18 0.26
#> eta.factor(RIAGENDR)2:factor(RIDRETH1)3  0.60 0.19
#> eta.factor(RIAGENDR)2:factor(RIDRETH1)4  0.06 0.37
#> eta.factor(RIAGENDR)2:factor(RIDRETH1)5  0.38 0.44```

The functions returns an object of class `svymle`, so all methods provided by the survey-package for this class work – it’s just that there are only a few, and common methods like `predict()` are currently not implemented. Maybe, hopefully, future updates of the survey-package will include such features.

Tagged: R, rstats, Statistik

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...