More Haskell: a bootstrap

Posted on February 16, 2013 by mrtnj in R bloggers | 0 Comments

[This article was first published on There is grandeur in this view of life » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

So my playing around with Haskell goes on. You can follow the progress of the little bootstrap exercise on github. Now it’s gotten to the point where it actually does a bootstrap interval for the mean of a sample. Consider the following R script:

n <- 100
fake.data <- data.frame(group=rep(1, n), data=rpois(n, 10))
write.table(fake.data, quote=F, row.names=F, col.names=F,
            sep=",", file="fake_data.csv")

library(plyr)
bootstrap.replicates <- llply(vector("list", 100),
                              sample, x=fake.data$data,
                              replace=T, size=n)
bootstrap.means <- unlist(llply(bootstrap.replicates, mean))
print(mean(fake.data$data))
print(quantile(bootstrap.means, c(0.025, 0.975)))

[1] 10.31
    2.5%    97.5% 
 9.72475 10.85200

So, that was a simple bootstrap in R: we get some draws from a Poisson distribution, sample 100 times from the data with replacement, and summarise the replicates. This is my Haskell thing running in GHCi:

*Main> main
"boot"
"will eventually bootstrap, if martin knows his stuff"
fake_data.csv
[8,6,11,16,5,11,12,12,7,9,13,13,12,7,13,7,7,11,9,14,14,13,10,14,17,12,8,
10,15,12,13,13,7,10,9,6,7,8,10,12,10,10,10,12,11,8,16,12,13,13,12,15,7,
7,8,9,5,7,13,10,12,11,8,6,12,14,12,14,6,9,10,9,10,6,9,7,6,12,13,7,11,7,
13,15,10,10,9,12,12,6,10,6,8,10,13,8,9,13,12,13]
10.31
(9.8,10.83)

It’s certainly not the prettiest thing in the world (for one thing, it will crash if there is an extra line break at the end of the file). Next stop: type declarations! Haskell will infer the types for me, but it is probably a good idea to declare the intended types. Or at least to be able to do so is. Then the plan is to make some use of the first column in the data file, i.e. group the sample belongs to, to add a second sample and make a comparison between the means. And then it’s pretty much done and maybe I’ll move on to something more useful. I’m thinking that implementing least squares linear models would be a decent exercise?

Postat i:data analysis, english Tagged: haskell, R