Part 3: Two more implementations of optimism corrected bootstrapping show shocking bias

[This article was first published on R – intobioinformatics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Welcome to part III of debunking the optimism corrected bootstrap in high dimensions (quite high number of features) in the Christmas holidays. Previously we saw with a reproducible code implementation that this method is very bias when we have many features (50-100 or more). I suggest avoiding this method until at some point it has been reassessed thoroughly to find how bad this situation is with different numbers of dimensions. Yes, I know for some statisticians this is your favorite method and they tell people how their method is lacking in statistical power, but clearly this has much worse issues, at least on some data. People are currently using this method in genomic medicine, where we have high numbers of features low samples. Just re-run the code yourself and make up your own mind if in doubt. I have now 3 implementations (excluding Caret) confirming the bias. One written by me, two independent statisticians. Let’s run some more experiments.

This time I have used a different persons implementation using the ‘glm’ function, i.e. logistic regression to show the same misleading trend occurs, i.e. positive results on purely random data. The code has been directly taken from, here. The second implementation is from another unnamed statistician, there was a bug in their code so I had to correct it, there may be further errors in it so feel free to check.

If you disagree with these findings, please show your own implementation of the method in R (not using Caret) following the same experiment of increasing number of features that are purely noise very high with binary labels. Also, check the last two posts and re-run the code your self, before making your mind up and providing counter arguments. I am practically certain there is a serious problem with this method. Yet again, with situations like this, don’t rely on Mr X’s publication from 10 years ago, use null simulated data sets with different dimensions to investigate the behavior of the method in question your self.

There are no real predictors in this data, all of them are random data from a normal distribution. The bias is less bad using glm than with glmnet. This method is being used (with high numbers of features) as we speak by Mr Fudge and his friends so they can get into better journals. The system is corrupt.

You should be able to copy and paste this code directly into R to repeat the results. I can’t vouch for this code because I did not write it, but both implementations show the same result as in the last blog post with my code.

Implementation 1. A glm experiment.

### example of logistic regression optimism corrected bootstrapping
# source:

auc.adjust <- function(data, fit, B){
fit.model <- fit
data$pred.prob <- fitted(fit.model) <- roc(data[,1], data$pred.prob, data=data)$auc # require 'pROC'
auc.boot <- vector (mode = "numeric", length = B)
auc.orig <- vector (mode = "numeric", length = B)
o <- vector (mode = "numeric", length = B)
for(i in 1:B){
boot.sample <- sample.rows(data, nrow(data), replace=TRUE) # require 'kimisc'
fit.boot <- glm(formula(fit.model), data = boot.sample, family = "binomial")
boot.sample$pred.prob <- fitted(fit.boot)
auc.boot[i] <- roc(boot.sample[,1], boot.sample$pred.prob, data=boot.sample)$auc
data$pred.prob.back <- predict.glm(fit.boot, newdata=data, type="response")
auc.orig[i] <- roc(data[,1], data$pred.prob.back, data=data)$auc
o[i] <- auc.boot[i] - auc.orig[i]
auc.adj <- - (sum(o)/B)
boxplot(auc.boot, auc.orig, names=c("auc.boot", "auc.orig"))
title(main=paste("Optimism-adjusted AUC", "\nn of bootstrap resamples:", B), sub=paste(" (blue line)=", round(, digits=4),"\nadj.auc (red line)=", round(auc.adj, digits=4)), cex.sub=0.8)
abline(, col="blue", lty=2)
abline(h=auc.adj, col="red", lty=3)

## generate random data

xx.glmnet <- matrix(nrow=53,ncol=1)
xx.glmnet <- data.frame(xx.glmnet)
xx.glmnet[,1] <- c(rep(0,25),rep(1,28))
test <- matrix(rnorm(53*1000, mean = 0, sd = 1),
nrow = 53, ncol = 1000, byrow = TRUE)
xx.glmnet <- cbind(xx.glmnet,test)
colnames(xx.glmnet) <- c('outcome',paste('Y',seq(1,1000),sep=''))

## 1. make overfitted model

model <- glm(outcome ~., data = xx.glmnet, family = "binomial")

## 2. estimate optimism and correct

auc.adjust(xx.glmnet, model, B=200)

Let's have a look at the results, which agree nicely with our previous findings using a from scratch implementation of the method. So the red line is supposedly our corrected AUC, but the AUC should be 0.5 when running on random data. See previous part 1 post and part 2 post for demonstration of cross validation results on random data which give the correct result.


Implementation 2: Another glmnet experiment

### example of optimism corrected bootstrapping implementation
# source: an unnamed programmer


#glmnet -all predictors penalised
compare_opt_glmnet<- function(orig_data_glmnet, i){
# orig_data_glmnet = the whole data
train_data_glmnet<- orig_data_glmnet[i, ] # get bootstrap
model_glmnet<- model_process_glmnet(train_data_glmnet)
# return a summary optimism results
AUC_results_glmnet <- c(
# boot against original
AUC_train1=roc(orig_data_glmnet$outcome,as.vector(predict(model_glmnet,type="response", newx = model.matrix (outcome~.,orig_data_glmnet )[,-1], s = 'lambda.min')))$auc,
# boot against boot
AUC_train2=roc(train_data_glmnet$outcome,as.vector(predict(model_glmnet,type="response", newx = model.matrix (outcome~.,train_data_glmnet )[,-1], s = 'lambda.min')))$auc,
AUC_diff=roc(train_data_glmnet$outcome,as.vector(predict(model_glmnet,type="response", newx = model.matrix (outcome~.,train_data_glmnet )[,-1], s = 'lambda.min')))$auc-
roc(orig_data_glmnet$outcome,as.vector(predict(model_glmnet,type="response", newx = model.matrix (outcome~.,orig_data_glmnet )[,-1], s = 'lambda.min')))$auc

model_process_glmnet<- function(the_data_glmnet){
model_final_glmnet <- cv.glmnet (model.matrix (outcome~.,the_data_glmnet )[,-1],the_data_glmnet$outcome,alpha=1,family = "binomial")

# generate random data

test <- matrix(rnorm(53*1000, mean = 0, sd = 1),
nrow = 53, ncol = 1000, byrow = TRUE)
test <- data.frame(test)
labs <- factor(c(0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,1,1,1,0,1,1,0,1,0,1,0,0,0,0,0,
labs <- data.frame(as.matrix(labs))
colnames(labs)[1] <- 'outcome'
xx.glmnet <- cbind(labs,test)

## 1. make overfitted model

tempd <- xx.glmnet[,2:ncol(xx.glmnet)]
labels <- xx.glmnet[,1]
overfittedmodel <- cv.glmnet(as.matrix(tempd),y=labels,alpha=1,family="binomial")
lasso.prob <- predict(overfittedmodel,type="response",newx=as.matrix(tempd),s='lambda.min')
pred.cligen <- prediction(lasso.prob, labels)
auc <- roc(labels,as.vector(lasso.prob))
overfitted_auc <- as.numeric(auc$auc)

## 2. try to correct for optimism

Repeats = 100
res_opt_glmnet <- boot(xx.glmnet, statistic = compare_opt_glmnet, R = Repeats)
optimism_glmnet <- mean(res_opt_glmnet$t[ , 3])
enhanced_glmnet <- overfitted_auc - optimism_glmnet

Here are the results of the above code on random data only (pure madness).

> enhanced_glmnet
[1] 0.8916488

There is some disagreement how to implement this method in Caret as Davis Vaughan disagrees with doing it the way shown here. So, until Max Kuhn gets back to me, relying on the last two posts, I think all the evidence points to a massive positive bias in peoples results using this method with high numbers of features.

If in doubt compare your results with LOOCV and k fold cross validation, this is how I discovered this method is dodgy by seeing MASSIVE differences on null data-sets where no real predictive ability should be found. These two methods are more widely used and reliable.

Sorry @ Frank Harrell, I don’t have time atm to do the extra bits you suggest.
Game on @ Davis Vaughan. Do your own implementation with the same data as I have used with glmnet and glm.

The steps of this method, briefly, are as follows:

  1. Fit a model M to entire data S and estimate predictive ability C.
  2. Iterate from b=1…B:
    1. Take a resample from the original data, S*
    2. Fit the bootstrap model M* to S* and get predictive ability, C_boot
    3. Use the bootstrap model M* to get predictive ability on S, C_orig
  3. Optimism O is calculated as mean(C_boot – C_orig)
  4. Calculate optimism corrected performance as C-O.

To leave a comment for the author, please follow the link and comment on their blog: R – intobioinformatics. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)