# A small comparison of bio-equivalence calculations.

November 10, 2013
By

(This article was first published on Wiekvoet, and kindly contributed to R-bloggers)

Last week I looked at two-way cross-over studies and followed the example of Schütz (http://bebac.at/) in the analysis. Since the EU has its on opinions (Questions & Answers: Positions on specific questions addressed to the pharmacokinetics working party) and two example data sets, I was wondering how the various computations compared.

### Data

There are 6 data-sets in the analysis.
1. The first data from Schütz.
2. The ‘Dataset from Chow & Liu’. This is a two-period data set. Schütz did these data in three versions. This version is complete (balanced).
3. Data set 2; incomplete (one subject misses one evaluation). Note the EU guide states: ‘subjects in a crossover trial who do not provide evaluable data for both of the test and reference products should not be included‘.
4. Data set 2; unbalanced (one subject is missing).
5. EU data set 1: ‘a four period crossover study where subjects receive both test and reference twice, with some subjects providing data for only a subset of the treatment periods
6. EU data set 2: ‘a three period crossover study where all subjects receive reference twice and test once

### Bio-equivalence analysis

There are five algorithms. I wrapped each in a function and made a unified print() method, so as to ease comparison. These functions have two input parameters; the data and the index for the level of the reference treatment.
1. The code from Schütz slide 14. Note this only did two period data, and not the incomplete data.
2. ANOVA. In contrast to my previous post subject is now fixed. For the interval I use just the residual degrees of freedom.
3. nlme This should deliver what proc mixed would deliver
4. MCMCglmm. A Bayesian MCMC sampler.
5. lme4. This is traditional mixed model with point estimate and interval via a simulation

### Result comparison

#### Data  set 1

Method     Point_estimate p5CI     p95CI    CVIntra
[1,] “Schutz”   100.8168       95.47312 106.4596 7.370138
[2,] “ANOVA”    100.8168       95.47312 106.4596 7.370138
[3,] “nlme”     100.8168       95.47312 106.4596 7.370138
[4,] “MCMCglmm” 100.8011       95.38906 106.5084 8.336442
[5,] “lme4”     100.71         96.03672 105.9925 7.370137

#### Data set 2

Method     Point_estimate p5CI     p95CI    CVIntra
[1,] “Schutz”   97.17545       88.3128  106.9275 19.47357
[2,] “ANOVA”    97.17545       88.3128  106.9275 19.47357
[3,] “nlme”     97.17545       88.31279 106.9275 19.47359
[4,] “MCMCglmm” 97.09287       86.66635 108.7019 24.21701
[5,] “lme4”     97.51018       89.10346 106.7388 19.47351

#### Data set 3: Inbalanced

Method     Point_estimate p5CI     p95CI    CVIntra

[1,] “Schutz”   95.79137       87.02926 105.4356 19.05692
[2,] “ANOVA”    95.60894       86.86353 105.2348 19.05692
[3,] “nlme”     95.60894       86.86352 105.2349 19.05693
[4,] “MCMCglmm” 95.6052        85.20512 107.4927 24.13791
[5,] “lme4”     95.80032       86.8732  103.6471 19.05691

#### Data set 4: Incomplete

Method     Point_estimate p5CI     p95CI    CVIntra
[1,] “ANOVA”    95.60894       86.86353 105.2348 19.05692
[2,] “nlme”     96.46814       87.6157  106.215  19.2212
[3,] “MCMCglmm” 96.78571       86.78392 108.6978 23.77217
[4,] “lme4”     96.76121       88.38716 105.469  19.22122

#### Data set 5: four period crossover

Method     Point_estimate p5CI     p95CI    CVIntra
[1,] “ANOVA”    115.6587       107.1057 124.8948 41.65396
[2,] “nlme”     115.7298       107.1707 124.9725 41.66876
[3,] “MCMCglmm” 115.7334       107.0849 125.0422 41.87724
[4,] “lme4”     115.7831       106.6829 124.6334 41.66873

#### Data set 6: three period crossover

Method     Point_estimate p5CI     p95CI    CVIntra

[1,] “ANOVA”    102.2644       97.31555 107.4649 11.85557
[2,] “nlme”     102.2644       97.31555 107.4649 11.85557
[3,] “MCMCglmm” 102.2624       97.29063 107.4589 12.17123
[4,] “lme4”     102.3081       97.30974 107.192  11.85559

From this small analysis one can observe

• Only to use the Schütz code for balanced two period crossover data. But in that case nlme and ANOVA provide the same answers. Hence it is purely educational
• Given the width of the intervals, the point estimates are close enough so differences can be ignored.
• For all data except the incomplete data the difference between nlme and ANOVA is absent or very small.
• MCMCglmm has some differences, compared to other methods it over estimates CVIntra. MCMCglmm gives a slightly wider confidence interval for bioequivalence.
• lme4 had in most cases a slightly narrower interval.

### Code

###############################################################################
# Data
###############################################################################
#Schutz 1
T1 <- c(28.39,33.36,24.29,28.61,59.49,42.30)
T2 <- c(49.42,36.78,34.81,45.54,28.23,25.71)
R1 <- c(39.86,32.75,34.97,45.44,27.87,24.26)
R2 <- c(35.44,33.40,24.65,31.77,65.29,37.01)
S1 <- data.frame(
aval=c(T1,R2,R1,T2),
subjid=factor(c(
rep(c(1,4,6,7,9,12),2),
rep(c(2,3,5,8,10,11),2))),
seqa=rep(c(‘TR’,’RT’),each=12),
trta=rep(c(‘T’,’R’,’R’,’T’),each=6),
aperiod=factor(rep(c(1,2,1,2),each=6))
)
S1 <- S1[order(S1\$subjid,S1\$aperiod),]
S1\$logAval <- log(S1\$aval)

Subject Period Sequence Formulation AUC Subject Period Sequence Formulation AUC
1 1 RT Reference 74.675 13 1 TR Test 122.45
1 2 RT Test 73.675 13 2 TR Reference 124.975
2 1 TR Test 74.825 14 1 TR Test 99.075
2 2 TR Reference 37.35 14 2 TR Reference 85.225
3 1 TR Test 86.875 15 1 RT Reference 69.725
3 2 TR Reference 51.925 15 2 RT Test 59.425
4 1 RT Reference 96.4 16 1 RT Reference 86.275
4 2 RT Test 93.25 16 2 RT Test 76.125
5 1 RT Reference 101.95 17 1 TR Test 86.35
5 2 RT Test 102.125 17 2 TR Reference 95.925
6 1 RT Reference 79.05 18 1 TR Test 49.925
6 2 RT Test 69.45 18 2 TR Reference 67.1
7 1 TR Test 81.675 19 1 RT Reference 112.675
7 2 TR Reference 72.175 19 2 RT Test 114.875
8 1 TR Test 92.7 20 1 RT Reference 99.525
8 2 TR Reference 77.5 20 2 RT Test 116.25
9 1 TR Test 50.45 21 1 TR Test 42.7
9 2 TR Reference 71.875 21 2 TR Reference 59.425
10 1 TR Test 66.125 22 1 TR Test 91.725
10 2 TR Reference 94.025 22 2 TR Reference 114.05
11 1 RT Reference 79.05 23 1 RT Reference 89.425
11 2 RT Test 69.025 23 2 RT Test 64.175
12 1 RT Reference 85.95 24 1 RT Reference 55.175
12 2 RT Test 68.7 24 2 RT Test 74.575
S2 <- with(S2,data.frame(
aval=c(AUC,AUC.1),
subjid=factor(c(Subject,Subject.1)),
seqa=factor(c(Sequence,Sequence.1)),
trta=factor(c(Formulation,Formulation.1)),
aperiod=factor(c(Period,Period.1))))
S2\$logAval <- log(S2\$aval)
S2InC <- S2[!(S2\$subjid==’24’ & S2\$aperiod==’2′),]
S2InB <- S2[!(S2\$subjid==’24’),]

##
#The EU data sets I copied-pasted into .csv prior to analysis.
#EU1
EU1\$subjid <- factor(EU1\$SUBJECT)
EU1\$aperiod <- factor(EU1\$PERIOD)
EU1\$aval <- EU1\$DATA
EU1\$seqa <- EU1\$SEQUENCE
EU1\$trta <- EU1\$FORMULATION
EU1\$logAval <- log(EU1\$aval)
#EU2
EU2\$subjid <- factor(EU2\$SUBJECT)
EU2\$aperiod <- factor(EU2\$PERIOD)
EU2\$aval <- EU2\$DATA
EU2\$seqa <- EU2\$SEQUENCE
EU2\$trta <- EU2\$FORMULATION
EU2\$logAval <- log(EU2\$aval)

###############################################################################
# Analysis functions
###############################################################################

print.BEQ <- function(x,…) {
result <- paste(
” Back transformed (raw data scale)\n”,
“Point estimate:”,
round(x\$Point_estimate, digits=2),”%\n”,
“90 % confidence interval:”,
round(x\$p5CI,digits=2), “% to”,
round(x\$p95CI, digits=2),”%\n”,
“CVIntra:”,round(x\$CVIntra,digits=2),
“%\n”)
cat(result)
invisible(x)
}

BESchutz <- function(datain,iref) {
prepdata <- datain[order(datain\$seqa,datain\$subjid,datain\$aperiod),]
T1 <- with(prepdata,aval[seqa==levels(seqa)[iref] & aperiod==1])
R1 <- with(prepdata,aval[seqa==levels(seqa)[3-iref] & aperiod==1])
R2 <- with(prepdata,aval[seqa==levels(seqa)[iref] & aperiod==2])
T2 <- with(prepdata,aval[seqa==levels(seqa)[3-iref] & aperiod==2])
RT <- log(R1) – log(T2)
TR <- log(R2) – log(T1)
n1 <- length(RT)
mRT <- mean(RT)
vRT <- var(RT)
n2 <- length(TR)
mTR <- mean(TR)
vTR <- var(TR)
mD <- mean(log(c(T1,T2))) – mean(log(c(R1,R2)))
MSE <- (((n1-1)*vRT + (n2-1)*vTR)/(n1+n2-2))/2
alpha <- 0.05
lo <- mD – qt(1-alpha,n1+n2-2)*sqrt(MSE)*
sqrt((1/(2*n1) + 1/(2*n2)))
hi <- mD + qt(1-alpha,n1+n2-2)*sqrt(MSE)*
sqrt((1/(2*n1) + 1/(2*n2)))
#result <- paste(
#    paste(” Back transformed (raw data scale)\n”,
#        “Point estimate:”,
#        round(100*exp(mD), digits=2),”%\n”),
#    paste(“90 % confidence interval:”),
#    paste(round(100*exp(lo), digits=2), “% to”),
#    paste(round(100*exp(hi), digits=2),”%\n”,
#        paste(“CVintra:”,round(100*sqrt(exp(MSE)-1),
#                digits=2),”%\n”)))
result <- list(Method=’Schutz’,
Point_estimate=100*exp(mD),
p5CI=100*exp(lo),
p95CI=100*exp(hi),
CVIntra=100*sqrt(exp(MSE)-1))
class(result) <- c(‘BEQ’,class(result))
result
}

#######
# plain anova
#######
BEanova <- function(datain,iref) {
trtastring <- paste(‘trta’,levels(datain\$trta)[iref],sep=”)
A1 <- aov(logAval ~  aperiod +seqa+ trta + subjid,
data=datain)
SE <- sqrt(vcov(A1)[trtastring,trtastring])
td <- qt(.05,A1\$df.residual)
result <- list(Method=’ANOVA’,
Point_estimate=100*exp(coef(A1)[trtastring]),
p5CI=100*exp(coef(A1)[trtastring]+td*SE),
p95CI=100*exp(coef(A1)[trtastring]-td*SE),
CVIntra=100*sqrt(exp(sum(A1\$residuals^2)/A1\$df.residual)-1))
class(result) <- c(‘BEQ’,class(result))
result
}

########
# nlme
########
library(nlme)
BEnlme <- function(datain,iref) {
trtastring <- paste(‘trta’,levels(datain\$trta)[iref],sep=”)
L1 <- lme(logAval ~ aperiod + seqa + trta,
random=~1|subjid,
data=datain)
coL1 <- fixef(L1)
SEL1 <- sqrt(vcov(L1)[trtastring,trtastring])
dfL1 <- L1\$fixDF\$terms[names(L1\$fixDF\$terms)==’trta’]
tdL1 <- qt(.05,dfL1)
result <- list(Method=’nlme’,
Point_estimate=100*exp(coL1[trtastring]),
p5CI=100*exp(coL1[trtastring]+tdL1*SEL1),
p95CI=100*exp(coL1[trtastring]-tdL1*SEL1),
CVIntra=100*sqrt(exp(as.numeric(VarCorr(L1)[‘Residual’,’Variance’]))-1))
class(result) <- c(‘BEQ’,class(result))
result
}

######
# MCMCglm
######
library(MCMCglmm)
BEmcmcglmm <- function(datain,iref) {
trtastring <- paste(‘trta’,levels(datain\$trta)[iref],sep=”)

M1 <- MCMCglmm(logAval ~ aperiod + seqa + trta,
random= ~ subjid ,
data=datain,family=’gaussian’,nitt=500000,thin=20,
burnin=50000,verbose=FALSE)
M1est <- 100*exp(quantile(M1\$Sol[,trtastring],c(.05,.5,.95)))
result=list(Method=’MCMCglmm’,
Point_estimate=M1est[2],
p5CI=M1est[1],
p95CI=M1est[3],
CVIntra=    100*sqrt(exp(mean(M1\$VCV[,’units’]))-1))
class(result) <- c(‘BEQ’,class(result))
result
}

#######
# lme4
######
library(lme4)
BElmer <- function(datain,iref) {
trtastring <- paste(‘trta’,levels(datain\$trta)[iref],sep=”)
R1 <- lmer(logAval ~ aperiod + seqa + trta + (1|subjid),
data=datain)
simeff <- function(m0) {
s <- simulate(m0)
fixef(refit(m0,s))[trtastring]
}
ff <- replicate(1000,simeff(R1))
R1est <- 100*exp(quantile(ff,c(.05,.5,.95)))
result <- list(Method=’lme4′,
Point_estimate=R1est[2],
p5CI=R1est[1],
p95CI=R1est[3],
CVIntra=100*sqrt(exp(sigma(R1)^2)-1))
class(result) <- c(‘BEQ’,class(result))
result
}

###############################################################################
# Result summary
###############################################################################
t(cbind(BESchutz(S1,2)
, BEanova(S1,2)
, BEnlme(S1,2)
, BEmcmcglmm(S1,2)
, BElmer(S1,2)
))

t(cbind(BESchutz(S2,2)
, BEanova(S2,2)
, BEnlme(S2,2)
, BEmcmcglmm(S2,2)
, BElmer(S2,2)
))

t(cbind( BESchutz(S2InB,2)
, BEanova(S2InB,2)
, BEnlme(S2InB,2)
, BEmcmcglmm(S2InB,2)
, BElmer(S2InB,2)
))

t(cbind(  BEanova(S2InC,2)
, BEnlme(S2InC,2)
, BEmcmcglmm(S2InC,2)
, BElmer(S2InC,2)
))

t(cbind( BEanova(EU1,2)
, BEnlme(EU1,2)
, BEmcmcglmm(EU1,2)
, BElmer(EU1,2)
))
t(cbind( BEanova(EU2,2)
, BEnlme(EU2,2)
, BEmcmcglmm(EU2,2)
, BElmer(EU2,2)
))

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...