**R – Strenge Jacke!**, and kindly contributed to R-bloggers)

The survey-package from Thomas Lumley is a great toolkit when analyzing complex samples. It provides `svyglm()`

, to fit generalised linear models to data from a complex survey design. `svyglm()`

covers all families that are also provided by R’s `glm()`

– however, the survey-package has no function to fit negative binomial models, which might be useful for overdispersed count models. Yet, the package provides a generic `svymle()`

to fit user-specified likelihood estimations. In his book, Appendix E, Thomas Lumley describes how to write your own likelihood-function, passed to `svymle()`

, to fit *negative binomial models* for complex samples. So I wrote a small „wrapper“ and implemented a function `svyglm.nb()`

in my sjstats-package.

# ------------------------------------------ # This example reproduces the results from # Lumley 2010, figure E.7 (Appendix E, p256) # ------------------------------------------ library(sjstats) library(survey) data(nhanes_sample) # create survey design des <- svydesign( id = ~ SDMVPSU, strat = ~ SDMVSTRA, weights = ~ WTINT2YR, nest = TRUE, data = nhanes_sample ) # fit negative binomial regression fit <- svyglm.nb(total ~ factor(RIAGENDR) * (log(age) + factor(RIDRETH1)), des) # print coefficients and standard errors round(cbind(coef(fit), survey::SE(fit)), 2) #> [,1] [,2] #> theta.(Intercept) 0.81 0.05 #> eta.(Intercept) 2.29 0.16 #> eta.factor(RIAGENDR)2 -0.80 0.18 #> eta.log(age) 1.07 0.23 #> eta.factor(RIDRETH1)2 0.08 0.15 #> eta.factor(RIDRETH1)3 0.09 0.18 #> eta.factor(RIDRETH1)4 0.82 0.30 #> eta.factor(RIDRETH1)5 0.06 0.38 #> eta.factor(RIAGENDR)2:log(age) -1.22 0.27 #> eta.factor(RIAGENDR)2:factor(RIDRETH1)2 -0.18 0.26 #> eta.factor(RIAGENDR)2:factor(RIDRETH1)3 0.60 0.19 #> eta.factor(RIAGENDR)2:factor(RIDRETH1)4 0.06 0.37 #> eta.factor(RIAGENDR)2:factor(RIDRETH1)5 0.38 0.44

The functions returns an object of class `svymle`

, so all methods provided by the survey-package for this class work – it’s just that there are only a few, and common methods like `predict()`

are currently not implemented. Maybe, hopefully, future updates of the survey-package will include such features.

**leave a comment**for the author, please follow the link and comment on their blog:

**R – Strenge Jacke!**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...