GEE QIC update

November 15, 2012
By

(This article was first published on Daniel J. Hocking » Blog, and kindly contributed to R-bloggers)

Here is improved code for calculating QIC from geeglm in geepack in R (original post). Let me know how it works. I haven’t tested it much, but is seems that QIC may select overparameterized models. In the code below, I had to replace <- with = because wordpress didn’t except <- within code or pre tags. It should still work just fine.

Here is a quick example of how to run this function. First, highlight and run the code below in R. That will save the function in your workspace. Then run your gee model using geeglm in geepack (package available from CRAN). Next, run QIC(your_gee_model) and you get the QIC. You can then repeat this with alternative a priori models. Below the function is an example using data available as part of geepack.

    
############################################################
# QIC for GEE models
# Daniel J. Hocking
############################################################
QIC = function(model.R) {
      library(MASS)
      model.indep = update(model.R, corstr = "independence")
      # Quasilikelihood
      mu.R = model.R$fitted.values
      y = model.R$y
      type = family(model.R)$family
      quasi.R = switch(type,
                    poisson = sum((y*log(mu.R)) - mu.R),
                    gaussian = sum(((y - mu.R)^2)/-2),
                    binomial = sum(y*log(mu.R/(1 - mu.R)) + log(1 - mu.R)),
                    Gamma = sum(-y/(mu.R - log(mu.R))),
                    stop("Error: distribution not recognized"))
  # Trace Term (penalty for model complexity)
  omegaI = ginv(model.indep$geese$vbeta.naiv) # Omega-hat(I) via Moore-Penrose generalized inverse of a matrix in MASS package
  #AIinverse = solve(model.indep$geese$vbeta.naiv) # solve via indenity
  Vr = model.R$geese$vbeta
  trace.R = sum(diag(omegaI %*% Vr))
  px = length(mu.R) # number non-redunant columns in design matrix
  # QIC
  QIC = 2*(trace.R - quasi.R) [EDIT: original post was missing '*']
  #QICu = (-2)*quasi.R + 2*px    # Approximation assuming model structured correctly
  output = c(QIC, quasi.R, trace.R, px)
  names(output) = c('QIC', 'Quasi Lik', 'Trace', 'px')
  return(output)
}

Here’s an example you can run in R.

library(geepack)

data(dietox)
dietox$Cu = as.factor(dietox$Cu)
mf = formula(Weight ~ Cu * (Time + I(Time^2) + I(Time^3)))
gee1 = geeglm(mf, data = dietox, id = Pig, family = poisson, corstr = "ar1")
gee1
summary(gee1)

mf2 = formula(Weight ~ Cu * Time + I(Time^2) + I(Time^3))
gee2 = geeglm(mf2, data = dietox, id = Pig, family = poisson, corstr = "ar1")
summary(gee2)
anova(gee2)
anova(gee1, gee2)

mf3 = formula(Weight ~ Cu + Time + I(Time^2))
gee3 = geeglm(mf3, data = dietox, id = Pig, family = poisson, corstr = "ar1")
gee3.I = update(gee3, corstr = "independence")
gee3.Ex = update(gee3, corstr = "exchangeable")

sapply(list(gee1, gee2, gee3, gee3.I, gee3.Ex), QIC)

In the output of this model it suggests that model gee1 is the best model. I have some concerns that QIC will almost inevitably choose the most complex model. More testing with simulated data will be necessary.

               [,1]      [,2]      [,3]      [,4]      [,5]
QIC       -333199.7 -333188.0 -333187.5 -333181.8 -333153.6
Quasi Lik  166623.0  166622.7  166620.4  166622.2  166615.4
Trace          23.2      28.7      26.6      31.3      38.6
px            861.0     861.0     861.0     861.0     861.0

You will get warnings when running this model because it uses a Poisson distribution for continuous data. I will work on finding a better example in the future before I make this available as an R package.

Filed under: GEE, Modeling, R Tagged: GEE, generalized estimating equations, QIC, R, regression

To leave a comment for the author, please follow the link and comment on their blog: Daniel J. Hocking » Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Mango solutions



RStudio homepage



Zero Inflated Models and Generalized Linear Mixed Models with R

Dommino data lab

Quantide: statistical consulting and training



http://www.eoda.de







ODSC

ODSC

CRC R books series





Six Sigma Online Training





Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)