So my playing around with Haskell goes on. You can follow the progress of the little bootstrap exercise on github. Now it’s gotten to the point where it actually does a bootstrap interval for the mean of a sample. Consider the following R script:
n <- 100 fake.data <- data.frame(group=rep(1, n), data=rpois(n, 10)) write.table(fake.data, quote=F, row.names=F, col.names=F, sep=",", file="fake_data.csv") library(plyr) bootstrap.replicates <- llply(vector("list", 100), sample, x=fake.data$data, replace=T, size=n) bootstrap.means <- unlist(llply(bootstrap.replicates, mean)) print(mean(fake.data$data)) print(quantile(bootstrap.means, c(0.025, 0.975)))
 10.31 2.5% 97.5% 9.72475 10.85200
So, that was a simple bootstrap in R: we get some draws from a Poisson distribution, sample 100 times from the data with replacement, and summarise the replicates. This is my Haskell thing running in GHCi:
*Main> main "boot" "will eventually bootstrap, if martin knows his stuff" fake_data.csv [8,6,11,16,5,11,12,12,7,9,13,13,12,7,13,7,7,11,9,14,14,13,10,14,17,12,8, 10,15,12,13,13,7,10,9,6,7,8,10,12,10,10,10,12,11,8,16,12,13,13,12,15,7, 7,8,9,5,7,13,10,12,11,8,6,12,14,12,14,6,9,10,9,10,6,9,7,6,12,13,7,11,7, 13,15,10,10,9,12,12,6,10,6,8,10,13,8,9,13,12,13] 10.31 (9.8,10.83)
It’s certainly not the prettiest thing in the world (for one thing, it will crash if there is an extra line break at the end of the file). Next stop: type declarations! Haskell will infer the types for me, but it is probably a good idea to declare the intended types. Or at least to be able to do so is. Then the plan is to make some use of the first column in the data file, i.e. group the sample belongs to, to add a second sample and make a comparison between the means. And then it’s pretty much done and maybe I’ll move on to something more useful. I’m thinking that implementing least squares linear models would be a decent exercise?
Postat i:data analysis, english Tagged: haskell, R