GEE QIC update

Posted on November 15, 2012 by Daniel Hocking in Uncategorized | 0 Comments

[This article was first published on Daniel J. Hocking » Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Here is improved code for calculating QIC from geeglm in geepack in R (original post). Let me know how it works. I haven’t tested it much, but is seems that QIC may select overparameterized models. In the code below, I had to replace <- with = because wordpress didn’t except <- within code or pre tags. It should still work just fine.

Here is a quick example of how to run this function. First, highlight and run the code below in R. That will save the function in your workspace. Then run your gee model using geeglm in geepack (package available from CRAN). Next, run QIC(your_gee_model) and you get the QIC. You can then repeat this with alternative a priori models. Below the function is an example using data available as part of geepack.

    
############################################################
# QIC for GEE models
# Daniel J. Hocking
############################################################
QIC = function(model.R) {
      library(MASS)
      model.indep = update(model.R, corstr = "independence")
      # Quasilikelihood
      mu.R = model.R$fitted.values
      y = model.R$y
      type = family(model.R)$family
      quasi.R = switch(type,
                    poisson = sum((y*log(mu.R)) - mu.R),
                    gaussian = sum(((y - mu.R)^2)/-2),
                    binomial = sum(y*log(mu.R/(1 - mu.R)) + log(1 - mu.R)),
                    Gamma = sum(-y/(mu.R - log(mu.R))),
                    stop("Error: distribution not recognized"))
  # Trace Term (penalty for model complexity)
  omegaI = ginv(model.indep$geese$vbeta.naiv) # Omega-hat(I) via Moore-Penrose generalized inverse of a matrix in MASS package
  #AIinverse = solve(model.indep$geese$vbeta.naiv) # solve via indenity
  Vr = model.R$geese$vbeta
  trace.R = sum(diag(omegaI %*% Vr))
  px = length(mu.R) # number non-redunant columns in design matrix
  # QIC
  QIC = 2*(trace.R - quasi.R) [EDIT: original post was missing '*']
  #QICu = (-2)*quasi.R + 2*px    # Approximation assuming model structured correctly
  output = c(QIC, quasi.R, trace.R, px)
  names(output) = c('QIC', 'Quasi Lik', 'Trace', 'px')
  return(output)
}

Here’s an example you can run in R.

library(geepack)

data(dietox)
dietox$Cu = as.factor(dietox$Cu)
mf = formula(Weight ~ Cu * (Time + I(Time^2) + I(Time^3)))
gee1 = geeglm(mf, data = dietox, id = Pig, family = poisson, corstr = "ar1")
gee1
summary(gee1)

mf2 = formula(Weight ~ Cu * Time + I(Time^2) + I(Time^3))
gee2 = geeglm(mf2, data = dietox, id = Pig, family = poisson, corstr = "ar1")
summary(gee2)
anova(gee2)
anova(gee1, gee2)

mf3 = formula(Weight ~ Cu + Time + I(Time^2))
gee3 = geeglm(mf3, data = dietox, id = Pig, family = poisson, corstr = "ar1")
gee3.I = update(gee3, corstr = "independence")
gee3.Ex = update(gee3, corstr = "exchangeable")

sapply(list(gee1, gee2, gee3, gee3.I, gee3.Ex), QIC)

In the output of this model it suggests that model gee1 is the best model. I have some concerns that QIC will almost inevitably choose the most complex model. More testing with simulated data will be necessary.

               [,1]      [,2]      [,3]      [,4]      [,5]
QIC       -333199.7 -333188.0 -333187.5 -333181.8 -333153.6
Quasi Lik  166623.0  166622.7  166620.4  166622.2  166615.4
Trace          23.2      28.7      26.6      31.3      38.6
px            861.0     861.0     861.0     861.0     861.0

You will get warnings when running this model because it uses a Poisson distribution for continuous data. I will work on finding a better example in the future before I make this available as an R package.

Filed under: GEE, Modeling, R Tagged: GEE, generalized estimating equations, QIC, R, regression