A simple R bootstrap function for beginners

March 5, 2014
By

(This article was first published on Robert Grant's stats blog » R, and kindly contributed to R-bloggers)

I teach some introductory stats classes with SPSS, and one of the frustrations for me is that you have to pay an extra wad of cash to do any bootstrapping. It’s not exactly the complete analysis solution that you might expect from the sales literature. I could go on, but I guess IBM have better lawyers than me.

I think it’s a good idea to introduce the bootstrap early on. It’s not some advanced rocket-science technique any more, and beginners do a lot of medians and quartiles, for which there’s no (proper) standard error. What’s more, it makes random sampling at the heart of inference really explicit, which I think actually makes learning easier.

bootstrapfarm

So, I’ve been considering showing the students a quick diversion into R for the bootstrap. Stata has a great bootstrap syntax, but it’s not available in our computer teaching rooms. We don’t have time to do more in R, and it’s beyond my control to switch the whole software package for the course. The trouble is, the boot package has that weird thing where you have to redefine whatever statistic you’re interested in as a function with two parameters, even if a perfectly good one already exists. I suppose it’s to do with efficiency and vectorising over the replicate index vectors, but it’s the last thing you want to talk beginners through.

So I thought I could usefully wrap up the usual boot and boot.ci functions in a simpleboot wrapper.  It’s not a wise choice for serious analysis, and I haven’t made it pretty, but for teaching it makes things a little more accessible. Any thoughts, let me know.

simpleboot<-function(x,stat,reps=1000) {
cat("Bootstrapping can go wrong!\n")
 cat("This simple function will not show you warning messages.\n")
 cat("Check results closely and be prepared to consult a statistician.\n")
 if(stat=="max" | stat=="min") { warning("Bootstrap is likely to fail for minima and maxima") }
 require(boot)
 eval(parse(text=eval(substitute(paste("p.func<-function(x,i) ",stat,"(x[i])",sep=""),list(stat=stat)))))
 myboots<-boot(x,statistic=p.func,R=reps,stype="i")
 hist(bmed$t,breaks=25,main="EDF from bootstrap",xlab=stat)
 suppressWarnings(return(list(replicates=reps,point.estimate=myboots$t0,normal.ci=c(boot.ci(myboots)$normal[2],boot.ci(myboots)$normal[3]),
percent.ci=c(boot.ci(myboots)$percent[4],boot.ci(myboots)$percent[5]),
bca.ci=c(boot.ci(myboots)$bca[4],boot.ci(myboots)$bca[5]))))
}
# example:
mydata<-rchisq(25,df=3)
simpleboot(mydata,"median")

To leave a comment for the author, please follow the link and comment on his blog: Robert Grant's stats blog » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.