For descriptive statistics, values below LLOQ set to …
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
That is what I read the other day. For calculation of descriptive statistics, values below the LLOQ (lower limit of quantification) were set to…. Then I wondered, wasn’t there a trick in JAGS to incorporate the presence of missing data while estimating parameters of a distribution. How would that compare with standard methods such as imputing LLOQ at 0, half LLOQ, LLOQ or just ignoring the missing data? A simulation seems to be called for.
Bayes estimation
A bit of searching gave me a way function to estimate the mean and standard deviation, here wrapped in a function so resulting data has a nice shape.
library(R2jags)
library(plyr)
library(ggplot2)
Bsensnorm <- function(yy,limit) {
isCensored <- is.na(yy)
model1 <- function() {
for ( i in 1:N ) {
isCensored[i] ~ dinterval( yy[i] , limit )
yy[i] ~ dnorm( mu , tau )
}
tau <- 1/pow(sigma,2)
sigma ~ dunif(0,100)
mu ~ dnorm(0,1E-6)
}
datain <- list(yy=yy,isCensored=1-as.numeric(isCensored),N=length(yy),limit=limit)
params <- c('mu','sigma')
inits <- function() {
list(mu = rnorm(0,1),
sigma = runif(0,1))
}
jagsfit <- jags(datain, model=model1, inits=inits,
parameters=params,progress.bar=”gui”,
n.chains=1,n.iter=7000,n.burnin=2000,n.thin=2)
data.frame( system=’Bayes Uninf’,
mu_out=jagsfit$BUGSoutput$mean$mu,
sigma_out=jagsfit$BUGSoutput$mean$sigma)
}
Somewhat informative prior
BsensInf <- function(yy,limit) {
isCensored <- is.na(yy)
model2 <- function() {
for ( i in 1:N ) {
isCensored[i] ~ dinterval( yy[i] , limit )
yy[i] ~ dnorm( mu , tau )
}
tau <- 1/pow(sigma,2)
sigma ~ dnorm(1,1) %_% T(0.001,)
mu ~ dnorm(0,1E-6)
}
datain <- list(yy=yy,isCensored=1-as.numeric(isCensored),N=length(yy),limit=limit)
params <- c('mu','sigma')
inits <- function() {
list(mu = abs(rnorm(0,1))+.01,
sigma = runif(.1,1))
}
jagsfit <- jags(datain, model=model2, inits=inits, parameters=params,progress.bar="gui",
n.chains=1,n.iter=7000,n.burnin=2000,n.thin=2)
data.frame( system=’Bayes Inf’,
mu_out=jagsfit$BUGSoutput$mean$mu,
sigma_out=jagsfit$BUGSoutput$mean$sigma)
}
Simple imputation
L0sensnorm <- function(yy,limit) {
yy[is.na(yy)] <- 0
data.frame(system=’as 0′,mu_out=mean(yy),sigma_out=sd(yy))
}
Lhsensnorm <- function(yy,limit) {
yy[is.na(yy)] <- limit/2
data.frame(system=’as half LOQ’,mu_out=mean(yy),sigma_out=sd(yy))
}
Llsensnorm <- function(yy,limit) {
yy[is.na(yy)] <- limit
data.frame(system=’as LOQ’,mu_out=mean(yy),sigma_out=sd(yy))
}
LNAsensnorm <- function(yy,limit) {
yy <- yy[!is.na(yy)]
data.frame(system=’as missing’,mu_out=mean(yy),sigma_out=sd(yy))
}
Simulations
limitest <- function(yy,limit) {
yy[yy
Bsensnorm(yy,limit),
BsensInf(yy,limit),
L0sensnorm(yy,limit),
Lhsensnorm(yy,limit),
Llsensnorm(yy,limit),
LNAsensnorm(yy,limit)))
}
simtest <- function(n,muin,sdin,limit) {
numnotna<- 0
while(numnotna
yy <- rnorm(n,muin,sdin)
numnotna <- sum(yy>limit)
}
ll <- limitest(yy,limit)
ll$n_na <- sum(yy
}
datain <- expand.grid(n=c(5,6,8,10,12,15,20,50,80),muin=seq(1,3.5,.05),sdin=1,limit=1)
dataout <- mdply(datain,simtest)
Results for means
After all simulations, a difference is calculated between the simulation mean and the estimated means. This made interpreting the plots much more easy.
dataout$dif <- dataout$mu_out-dataout$muin
Uninformed prior did not perform with very little data
ggplot(dataout[dataout$n %in% c(5,6,8) & dataout$system %in% c(‘Bayes Uninf’,’Bayes Inf’),],aes(x=muin,y=dif)) + #group=Source,
geom_smooth(method=’loess’) +
geom_point() +
facet_grid(system ~n , drop=TRUE) +
scale_x_continuous(‘simulation mu’) +
scale_y_continuous(‘difference observed and simulation mu’)
At low number of observations variation of the mean swamps bias.
ggplot(dataout[dataout$n %in% c(5,6,8) & dataout$system!=’Bayes Uninf’,],aes(x=muin,y=dif)) + #group=Source,
geom_smooth(method=’loess’) +
geom_point() +
facet_grid(system ~n , drop=TRUE) +
scale_x_continuous(‘simulation mu’) +
scale_y_continuous(‘difference observed and simulation mu’)
At intermediate number of observations method choice gets more important
ggplot(dataout[dataout$n %in% c(10,12,15) & dataout$system!=’Bayes xUninf’,],aes(x=muin,y=dif)) + #group=Source,
geom_smooth(method=’loess’) +
geom_point() +
facet_grid(system ~n , drop=TRUE) +
scale_x_continuous(‘simulation mu’) +
scale_y_continuous(‘difference observed and simulation mu’)
At large number of observations bias gets far too large
At this point the missing option is not plotted any more, it is just not competitive. Setting at LLOQ gets a big positive bias, while 0 gets a negative bias, especially for 80 observations. There is no point in setting up an informed prior. Half LLOQ seems least biased out of the simple methods.
geom_smooth(method=’loess’) +
geom_point() +
facet_grid(system ~n , drop=TRUE) +
scale_x_continuous(‘simulation mu’) +
scale_y_continuous(‘difference observed and simulation mu’)
Standard deviation is negative biased for simple imputation
ggplot(dataout[dataout$n%in% c(5,6,8) & dataout$system!=’Bayes Uninf’,],aes(x=muin,y=sigma_out)) + #group=Source,
geom_smooth(method=’loess’) +
geom_point() +
facet_grid(system ~n , drop=TRUE) +
scale_x_continuous(‘simulation mu’) +
scale_y_continuous(‘estimated sigma’)
ggplot(dataout[dataout$n%in% c(10,12,15) & dataout$system!=’Bayes Uninf’,],aes(x=muin,y=sigma_out)) + #group=Source,
geom_smooth(method=’loess’) +
geom_point() +
facet_grid(system ~n , drop=TRUE) +
scale_x_continuous(‘simulation mu’) +
scale_y_continuous(‘estimated sigma’)
With a large number of observations half LLOQ results in a negative bias on the standard deviation.
ggplot(dataout[dataout$n %in% c(20,50,80) & dataout$system!=’as missing’,],aes(x=muin,y=sigma_out)) + #group=Source,
geom_smooth(method=’loess’) +
geom_point() +
facet_grid(system ~n , drop=TRUE) +
scale_x_continuous(‘simulation mu’) +
scale_y_continuous(‘estimated sigma’)
Conclusion
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.