Mixed Models: Influence in Heterogeneous Variance Model

Posted on September 29, 2013 by Wingfeet in R bloggers | 0 Comments

[This article was first published on Wiekvoet, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In this post I extend my knowledge of mixed models by redoing section 59.7 (page 5048) of the SAS/STAT user guide. I don’t think this particular example can be run in lme4, so that leaves nlme and MCMCglmm. MCMCglmm has less ability for influence measures, so this is only touched upon.

data

To quote the SAS/STAT User guide: ‘a one-way classification model with heterogeneous variances is fit. The data, (…), represent amounts of different types of fat absorbed by batches of doughnuts during cooking, measured in grams’. Data are read below. The final statement is just ordering the data similar to SAS, which is convenient for comparison on my side.

r1 <- read.table(textConnection('

1 164 1 172 1 168 1 177 1 156 1 195

2 178 2 191 2 197 2 182 2 185 2 177

3 175 3 193 3 178 3 171 3 163 3 176

4 155 4 166 4 149 4 164 4 170 4 168

‘) ,col.names=paste(‘a’,1:12,sep=”))

r2 <- data.frame(FatType=factor(as.matrix(r1[,seq(1,11,by=2)])),

Absorbed=as.numeric(as.matrix(r1[,seq(2,12,by=2)])))

r2 <- r2[c language="(seq(1,24,by=4),seq(2,24,by=4),seq(3,24,by=4),seq(4,24,by=4)),"][/c]

nlme

For a change, the gls() function is used from the nlme package. A few preparations are needed. A variable x is used as dummy variable in the varIdent sub-statement. Variable i is used in influence diagnostic plotting. SAS contrasts speak for themselves.

r2$x <- 1

r2$i <-1:24

options(contrasts=c( ‘contr.SAS’,’contr.SAS’))

library(nlme)

lme1 <- gls(Absorbed ~ FatType,

weights=varIdent(form=~x | FatType),data=r2 )

summary(lme1)

Generalized least squares fit by REML

Model: Absorbed ~ FatType

Data: r2

AIC BIC logLik

170.3109 178.2767 -77.15543

Variance function:

Structure: Different standard deviations per stratum

Formula: ~x | FatType

Parameter estimates:

1 2 3 4

1.0000000 0.5825169 0.7404827 0.6162593

Coefficients:

Value Std.Error t-value p-value

(Intercept) 162 3.356586 48.26332 0.0000

FatType1 10 6.397916 1.56301 0.1337

FatType2 23 4.618803 4.97965 0.0001

FatType3 14 5.247222 2.66808 0.0148

Correlation:

(Intr) FtTyp1 FtTyp2

FatType1 -0.525

FatType2 -0.727 0.381

FatType3 -0.640 0.336 0.465

Standardized residuals:

Min Q1 Med Q3 Max

-1.581138e+00 -6.625646e-01 -6.533961e-15 5.473172e-01 1.723923e+00

Residual standard error: 13.34166

Degrees of freedom: 24 total; 20 residual

As ever, nlme shows results differently from SAS. The missing bit is the variances. These can be displayed by this statement.

(lme1$sigma*

coef(lme1$modelStruct$varStruct,

uncons = FALSE, allCoef = TRUE))^2

1 2 3 4

177.99996 60.39999 97.59999 67.60003

diagnostics

Having set up the model, leave-one-out statistics is just a sapply() away. For brevity only a few are listed.

sa <- sapply(1:nrow(r2) ,function(x) {

rx <- r2

rx$Absorbed[x] <- NA

lmex <- gls(Absorbed ~ FatType,

weights=varIdent(form=~x | FatType),data=rx,na.action=na.omit )

cc <- coef(lmex$modelStruct$varStruct,uncons = FALSE, allCoef = TRUE)

cc <- cc[order(names(cc))]

c(coef(lmex),

(lmex$sigma*cc)^2)

})

tsa <- as.data.frame( t(sa))

head(tsa)

(Intercept) FatType1 FatType2 FatType3 1 2 3 4

1 162 11.6 23 14 203.29918 60.40017 97.59971 67.60023

2 162 10.0 23 14 222.50049 60.39999 97.59995 67.59992

3 162 10.8 23 14 217.70003 60.40000 97.59999 67.59999

4 162 9.0 23 14 214.99969 60.40013 97.59964 67.60018

5 162 13.2 23 14 145.70021 60.39986 97.60000 67.60007

6 162 5.4 23 14 63.79997 60.40006 97.59999 67.59996

Plots are more interesting.

tsa$i <- 1:nrow(tsa)

library(lattice)

par(mfrow=c(2,2))

plot(`(Intercept)`~i,data=tsa,xlab=”)

plot(FatType1~i,data=tsa,xlab=”)

plot(FatType2~i,data=tsa,xlab=”)

plot(FatType3~i,data=tsa,xlab=”)

It is a bit funny in influences, since fat-types are displayed equivalent to intercept. In terms of influence on the FatTypes means a plot not shown in SAS/STAT:

names(tsa)[1] <- 'Intercept'

xyplot(Intercept + I(Intercept+FatType1)

+ I(Intercept+FatType2)+ I(Intercept+FatType3) ~ i ,

data=tsa,

outer=TRUE,

strip=function(…, factor.levels)

strip.default( …,factor.levels=paste(‘FatType’,c(4,1:3))))

The next plot shows influence on the estimated variances

names(tsa)[5:8] <- paste('FT',1:4,sep='')

xyplot(FT1 + FT2 + FT3 + FT4 ~ i ,data=tsa,outer=TRUE)

residuals

The ground work for residuals is also present. Note that the studentized residuals come out slightly different; there seems to be a factor (5/6)^2 difference.

resdf <- data.frame(Absorbed=r2$Absorbed,

predicted=predict(lme1),

residual=residuals(lme1),

std=attr(residuals(lme1),’std’)

)

resdf$lofit <- 0

resdf$lofit[r2$FatType==’1′] <-

rowSums(tsa[,c(‘Intercept’,’FatType1′)])[r2$FatType==’1′]

resdf$lofit[r2$FatType==’2′] <-

rowSums(tsa[,c(‘Intercept’,’FatType2′)])[r2$FatType==’2′]

resdf$lofit[r2$FatType==’3′] <-

rowSums(tsa[,c(‘Intercept’,’FatType3′)])[r2$FatType==’3′]

resdf$lofit[r2$FatType==’4′] <- tsa[,c('Intercept')][r2$FatType=='4']

resdf$PRESSres <- resdf$Absorbed-resdf$lofit

resdf$StudRes <- resdf$residual/resdf$std

head(resdf)

Absorbed predicted residual std lofit PRESSres StudRes

1 164 172 -8.000000e+00 13.34166 173.6 -9.600000e+00 -5.996254e-01

5 172 172 -2.842171e-14 13.34166 172.0 5.684342e-14 -2.130297e-15

9 168 172 -4.000000e+00 13.34166 172.8 -4.800000e+00 -2.998127e-01

13 177 172 5.000000e+00 13.34166 171.0 6.000000e+00 3.747659e-01

17 156 172 -1.600000e+01 13.34166 175.2 -1.920000e+01 -1.199251e+00

21 195 172 2.300000e+01 13.34166 167.4 2.760000e+01 1.723923e+00

nlme’s own tools

nlme is not without tools for diagnostics as shown below. I labelled a number of points mentions in SAS/STAT Guide. The cut-off for labeling is determined by id. The value of 0.15 coincides with a two sided 85% interval, is chosen so as to label the four points mentioned in SAS/STAT Guide. It is certainly not a limit I would choose and I have some doubt about four influential points on a total of 24 points.

plot(lme1,form= resid(., type = “p”) ~ i | FatType,id=.15)

Second diagnostic plot contains normal plots. The marked points seem reasonable in line with the others.

qqnorm(lme1, ~ resid(., type = “p”) | FatType,id=.15)

MCMCglmm

While the nlme sections is really long, MCMCglmm is really short. To run the model there were some tricks. An additional variable avar is nested under FatTypes. MCMCglmm gives a units variance, which does not make sense in the context of this model. I did not know how to remove it, but a suitable prior was used to suppress it.

library(MCMCglmm)

r2$avar <- factor(rep(1:6,4))

prior1 <- list(R=list(V=diag(1)*.1,nu=10),

G=list(G1=list(V=diag(4),nu=.01) )

)

mc1 <- MCMCglmm(Absorbed ~ FatType,

random=~ idh(FatType):avar,

data=r2 ,pr=TRUE,verbose=FALSE,

prior=prior1, nitt=50000)

summary(mc1)

Iterations = 3001:49991

Thinning interval = 10

Sample size = 4700

DIC: 36.68339

G-structure: ~idh(FatType):avar

post.mean l-95% CI u-95% CI eff.samp

1.avar 295.5 42.52 773.6 4700

2.avar 100.1 14.96 259.5 4712

3.avar 164.9 21.59 443.2 4700

4.avar 110.4 18.25 291.0 4700

R-structure: ~units

post.mean l-95% CI u-95% CI eff.samp

units 0.1256 0.03751 0.2588 4359

Location effects: Absorbed ~ FatType

post.mean l-95% CI u-95% CI eff.samp pMCMC

(Intercept) 161.9363 153.0445 169.8837 4700 < 2e-04 ***

FatType1 10.0627 -6.9373 25.8132 4969 0.18681

FatType2 22.9649 12.0911 34.9963 4700 0.00128 **

FatType3 13.9831 0.8961 27.4354 4700 0.04043 *

—

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1

As before, the variances are larger than expected. There is also a function which looks dfferently at this. These values are lower than nlme & PROC MIXED.

posterior.mode(mc1$VCV)

1.avar 2.avar 3.avar 4.avar units

133.71520455 42.92154371 73.72997958 51.18475597 0.09115953

Influence diagnostics

It is easy enough to extract parameter estimates from the simulations, so a sapply() to get leave-one-out estimates similar to nlme is obvious. However, residuals() is not yet implemented, so that is far away. It was my choice not to do this.

To leave a comment for the author, please follow the link and comment on their blog: Wiekvoet.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Mixed Models: Influence in Heterogeneous Variance Model

data