JAGS model Fe concentration in rainwater including values below detection level

December 1, 2013
By

(This article was first published on Wiekvoet, and kindly contributed to R-bloggers)

In my previous post I ignored the fact that some data are below the detection level. I would not know how to handle those in a mixed model from lme4 or nlme. However, JAGS can handle these values. Next to that I kept the usual independent variables, such as random effects for location, machine etc.

Data

Data has been acquired and processed similar to last week. For script at the bottom of the page.

detection level

In the spreadsheet a '4' is used to indicate a value below detection limit. (* een " 4 "-teken in kolom C geeft aan dat de analyseresultaten beneden de detectielimiet liggen). There is something strange there. The detection limit seems to be 0.4, but there is data under 0.4 which is not flagged. In addition, some of the data is below 0, which seems improbable. In the end I flagged all data under 0.4 as NA. 
Fe <- rf2[rf2$Compound==levels(rf2$Compound)[9],]
library(lattice)
densityplot( ~ Amount |  machine + Status,
    groups=Location,data=Fe[!is.na(Fe$Status) 
            & Fe$Amount<.8,],
    panel=function(...) {
      panel.densityplot(...)
      panel.abline(v=c(0,0.4))
    },adjust=.5
)

Analysis

It is a reasonable standard JAGS model. The only tricky thing is initialization. When the parameters are too far away an error is thrown. Hence 'intercept' is initialized as -1. In text the model and inits are given, at the end the full code.
FeModel <- function() {
  for (i in 1:nobs) {
    BLZ[i] ~ dinterval( LAmount[i] , -0.4034029 )
    ExpLAmount[i] <- intercept + 
        rate*time[i] + 
        rateL[Location[i]]*time[i]+
        FMach[Machine[i]] + 
        FLoc[Location[i]]
    LAmount[i] ~ dnorm(ExpLAmount[i],tau[Machine[i]])
  }
  for (i in 1:2) {
    tau[i] <- pow(sd[i],-2)
    sd[i] ~ dunif(0,1000)
  }
  intercept ~ dnorm(0,.0001)
  FMach[1] ~dnorm(0,tauMachShift)
  FMach[2] <- -FMach[1]
  tauMachShift <- pow(sdMachShift,-2)
  sdMachShift ~ dunif(0,10)
  for (iL in 1:nloc) {
    FLoc[iL] ~ dnorm(0,tauLoc)
    rateL[iL] ~dnorm(0,tauRate)
    yearlyL[iL] <- pow(pow(10,rateL[iL]+rate),365)
  }
  tauLoc <- pow(sdLoc,-2)
  sdLoc ~ dunif(0,100)
  tauRate <- pow(sdRate,-2)
  sdRate ~ dunif(0,100)
  rate ~ dnorm(0,.0001)
  dailydec <- pow(10,rate)
  yearlydec <- pow(dailydec,365)
}

inits <- function() {
  list(intercept = -1,#rnorm(1,.45,.04),
      rate=rnorm(0,.01),
      sd=runif(2,0,50),
      sdMachShift=runif(1,0,1),
      sdLoc=runif(1,0,1))
}

Results

The model shows the decrease of 5% per year which was expected. 
Machine 1 (Eigenbrodt NSA 181 KHT) is slightly less accurate (was also observed last week) and has slightly lower values than 2 (Van Essen/ECN vanger). Variation between locations is less than variation between machines.
Inference for Bugs model at "C:/.../AppData/Local/Temp/Rtmpma5UEx/modelc40761211ba.txt", fit using jags,
 4 chains, each with 5000 iterations (first 2500 discarded), n.thin = 2
 n.sims = 5000 iterations saved
             mu.vect sd.vect     2.5%      25%      50%      75%    97.5%
FLoc[1]        0.023   0.070   -0.110   -0.022    0.021    0.065    0.169
FLoc[2]        0.139   0.081   -0.038    0.091    0.147    0.195    0.280
FLoc[3]       -0.041   0.069   -0.180   -0.083   -0.041    0.002    0.098
FLoc[4]       -0.103   0.080   -0.289   -0.147   -0.094   -0.048    0.032
FLoc[5]       -0.072   0.072   -0.209   -0.117   -0.075   -0.027    0.078
FLoc[6]       -0.036   0.070   -0.173   -0.079   -0.036    0.008    0.102
FLoc[7]       -0.016   0.069   -0.154   -0.058   -0.016    0.027    0.117
FLoc[8]        0.278   0.079    0.127    0.229    0.274    0.323    0.453
FLoc[9]       -0.061   0.089   -0.243   -0.118   -0.060    0.000    0.109
FLoc[10]       0.052   0.069   -0.091    0.009    0.052    0.095    0.186
FLoc[11]       0.040   0.079   -0.131   -0.008    0.048    0.096    0.172
FLoc[12]      -0.102   0.077   -0.245   -0.153   -0.106   -0.054    0.064
FLoc[13]      -0.054   0.082   -0.210   -0.107   -0.056   -0.005    0.117
FLoc[14]       0.027   0.067   -0.112   -0.013    0.028    0.070    0.155
FLoc[15]      -0.031   0.070   -0.164   -0.076   -0.033    0.011    0.116
FLoc[16]       0.088   0.070   -0.050    0.045    0.087    0.131    0.227
FLoc[17]      -0.154   0.076   -0.293   -0.203   -0.157   -0.106    0.006
FMach[1]      -0.053   0.017   -0.087   -0.065   -0.053   -0.041   -0.018
FMach[2]       0.053   0.017    0.018    0.041    0.053    0.065    0.087
intercept      0.401   0.081    0.240    0.345    0.402    0.455    0.559
sd[1]          0.436   0.018    0.402    0.423    0.436    0.448    0.471
sd[2]          0.386   0.007    0.372    0.381    0.386    0.391    0.401
sdLoc          0.135   0.036    0.077    0.110    0.130    0.154    0.216
sdMachShift    1.854   2.456    0.033    0.170    0.675    2.587    8.820
yearlyL[1]     0.944   0.007    0.931    0.940    0.944    0.948    0.957
yearlyL[2]     0.950   0.007    0.938    0.945    0.949    0.954    0.965
yearlyL[3]     0.945   0.006    0.932    0.941    0.945    0.949    0.958
yearlyL[4]     0.949   0.007    0.936    0.944    0.948    0.953    0.963
yearlyL[5]     0.943   0.007    0.929    0.939    0.944    0.948    0.956
yearlyL[6]     0.945   0.006    0.932    0.941    0.945    0.949    0.957
yearlyL[7]     0.946   0.006    0.933    0.941    0.946    0.950    0.958
yearlyL[8]     0.945   0.006    0.932    0.941    0.945    0.949    0.957
yearlyL[9]     0.944   0.007    0.929    0.940    0.945    0.949    0.958
yearlyL[10]    0.946   0.006    0.933    0.942    0.946    0.950    0.958
yearlyL[11]    0.950   0.006    0.938    0.946    0.950    0.954    0.964
yearlyL[12]    0.942   0.007    0.928    0.938    0.943    0.947    0.955
yearlyL[13]    0.944   0.007    0.930    0.939    0.944    0.948    0.956
yearlyL[14]    0.947   0.006    0.935    0.943    0.947    0.951    0.959
yearlyL[15]    0.944   0.006    0.931    0.940    0.944    0.948    0.956
yearlyL[16]    0.946   0.006    0.934    0.942    0.946    0.950    0.958
yearlyL[17]    0.942   0.007    0.926    0.938    0.943    0.947    0.956
yearlydec      0.945   0.005    0.936    0.942    0.945    0.949    0.955
deviance    1729.603  52.705 1627.020 1693.551 1728.751 1765.143 1831.422
             Rhat n.eff
FLoc[1]     1.002  2200
FLoc[2]     1.001  4000
FLoc[3]     1.001  5000
FLoc[4]     1.001  5000
FLoc[5]     1.001  5000
FLoc[6]     1.002  1500
FLoc[7]     1.001  5000
FLoc[8]     1.001  4400
FLoc[9]     1.002  1600
FLoc[10]    1.001  5000
FLoc[11]    1.001  2800
FLoc[12]    1.002  2200
FLoc[13]    1.001  5000
FLoc[14]    1.001  5000
FLoc[15]    1.001  2800
FLoc[16]    1.001  4800
FLoc[17]    1.004   740
FMach[1]    1.004   760
FMach[2]    1.004   760
intercept   1.001  3100
sd[1]       1.007   380
sd[2]       1.002  1500
sdLoc       1.003  1600
sdMachShift 1.001  5000
yearlyL[1]  1.001  3000
yearlyL[2]  1.001  4100
yearlyL[3]  1.002  1600
yearlyL[4]  1.001  2800
yearlyL[5]  1.002  1400
yearlyL[6]  1.002  2300
yearlyL[7]  1.001  5000
yearlyL[8]  1.001  5000
yearlyL[9]  1.001  4000
yearlyL[10] 1.001  3500
yearlyL[11] 1.001  5000
yearlyL[12] 1.002  1400
yearlyL[13] 1.001  5000
yearlyL[14] 1.002  1400
yearlyL[15] 1.001  3100
yearlyL[16] 1.002  1500
yearlyL[17] 1.004   850
yearlydec   1.002  1800
deviance    1.003   900

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 1385.1 and DIC = 3114.7
DIC is an estimate of expected predictive error (lower deviance is better).

Code

library(R2jags)

# http://www.lml.rivm.nl/data/gevalideerd/data/lmre_1992-2005.xls
# http://www.r-bloggers.com/read-excel-files-from-r/
library(XLConnect)

readsheet <- function(wb,cursheet) {
  df = readWorksheet(wb, sheet = cursheet , header = TRUE)
  topline <- 8
  test <- which(df[6,]=='C')+1
  numin <- df[topline:nrow(df),test]
  names(numin) <- make.names( gsub(' ','',df[6,test]))
  for(i in 1:ncol(numin)) numin[,i] <- as.numeric(gsub(',','.',numin[,i]))
  status = df[topline:nrow(df),test-1]
  names(status) <- paste('C',make.names( gsub(' ','',df[6,test])),sep='')
  
  numin$StartDate <- as.Date(substr(df[topline:nrow(df),1],1,10))
  numin$EndDate <-  as.Date(substr(df[topline:nrow(df),2],1,10))
  numin <- cbind(numin,status)
  tall <- reshape(numin,
      varying=list(make.names( gsub(' ','',df[6,test])),paste('C',make.names( gsub(' ','',df[6,test])),sep='')),
      v.names=c('Amount','Status'),
      timevar='Compound',
      idvar=c('StartDate','EndDate'),
      times=paste(df[6,test],'(',df[5,test],')',sep=''),
      direction='long')
  tall$Compound <- factor(gsub(' )',')',gsub(' +',' ',tall$Compound)))
  row.names(tall)<- NULL
  tall$Location <- cursheet
  tall
}

readfile <- function(fname) {
  wb <- loadWorkbook(fname)
  print(wb)
  sheets <- getSheets(wb)[-1]
  tt <- lapply(sheets,readsheet,wb=wb)
  tt2 <- do.call(rbind,tt) 
  tt2$Location <- factor(tt2$Location)
  tt2$fname <- fname
  tt2
}

fnames <- dir(pattern='*.xls') #.\\. ?
rf <- lapply(fnames,readfile)
rf2 <- do.call(rbind,rf)

rf2$machine <- factor(ifelse(rf2$fname=="lmre_1992-2005.xls",
        'Van Essen/ECN vanger',
        ' Eigenbrodt NSA 181 KHT'))

Fe <- rf2[rf2$Compound==levels(rf2$Compound)[9],]
library(lattice)
densityplot( ~ Amount |  machine + Status,
    groups=Location,data=Fe[!is.na(Fe$Status) 
            & Fe$Amount<.8,],
    panel=function(...) {
      panel.densityplot(...)
      panel.abline(v=c(0,0.4))
    },adjust=.5
)

Fe$LAmount <- log10(Fe$Amount)
Fe$LAmount[Fe$Amount<0.4] <- NA
Fe2 <- Fe[!is.na(Fe$Status),]

datain <- list(
    LAmount = Fe2$LAmount,
    BLZ = !is.na(Fe2$LAmount),
    Machine=(Fe2$machine=='Van Essen/ECN vanger')+1,
    Location=(1:nlevels(Fe2$Location))[Fe2$Location],
    time=as.numeric(Fe2$StartDate),
    nobs =nrow(Fe2),
    nloc=nlevels(Fe2$Location)
)

FeModel <- function() {
  for (i in 1:nobs) {
    BLZ[i] ~ dinterval( LAmount[i] , -0.4034029 )
    ExpLAmount[i] <- intercept + 
        rate*time[i] + 
        rateL[Location[i]]*time[i]+
        FMach[Machine[i]] + 
        FLoc[Location[i]]
    LAmount[i] ~ dnorm(ExpLAmount[i],tau[Machine[i]])
  }
  for (i in 1:2) {
    tau[i] <- pow(sd[i],-2)
    sd[i] ~ dunif(0,1000)
  }
  intercept ~ dnorm(0,.0001)
  FMach[1] ~dnorm(0,tauMachShift)
  FMach[2] <- -FMach[1]
  tauMachShift <- pow(sdMachShift,-2)
  sdMachShift ~ dunif(0,10)
  for (iL in 1:nloc) {
    FLoc[iL] ~ dnorm(0,tauLoc)
    rateL[iL] ~dnorm(0,tauRate)
    yearlyL[iL] <- pow(pow(10,rateL[iL]+rate),365)
  }
  tauLoc <- pow(sdLoc,-2)
  sdLoc ~ dunif(0,100)
  tauRate <- pow(sdRate,-2)
  sdRate ~ dunif(0,100)
  rate ~ dnorm(0,.0001)
  dailydec <- pow(10,rate)
  yearlydec <- pow(dailydec,365)
}

parameters <- c('intercept','yearlydec','sd',
    'sdMachShift','FMach',
    'sdLoc','FLoc',
    'yearlyL'    )
inits <- function() {
  list(intercept = -1,#rnorm(1,.45,.04),
      rate=rnorm(0,.01),
      sd=runif(2,0,50),
      sdMachShift=runif(1,0,1),
      sdLoc=runif(1,0,1))
}

# estimate        
jj <- jags(FeModel, data=datain,
    parameters=parameters, init=inits,n.iter=5000,
    progress.bar="gui",n.chains=4)
jj

To leave a comment for the author, please follow the link and comment on his blog: Wiekvoet.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.