Lee Edlefsen, Chief Scientist at Revolution Analytics, spoke about Big Data in R at the FHCRC a week or two back. He introduced the PEMA or parallel external memory algorithm. “Parallel external memory algorithms (PEMA's) allow solution of both ...

(This article was first published on Digithead's Lab Notebook, and kindly contributed to R-bloggers) Now that I'm ridiculously behind in the Stanford Online Statistical Learning class, I thought it would be fun to try to reproduce the figure on page 36 of the slides from chapter 3 or page 81 of the book. The result is a curvaceous surface...

Trevor Hastie and Robert Tibshirani are teaching an online class on Statistical Learning starting this week. The first week is introduction and overview, so it's not too late to join up. They've also published a new book, An Introduction to Statistical Learning, as a more accessible companion to their widely revered The Elements of Statistical...

Here a snippet of R to generate a Version 4 UUID. Dunno why there wouldn't be an official function for that in the standard libraries, but if there is, I couldn't find it. ## Version 4 UUIDs have the form: ## xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx ## where x is any hexadecimal digit and ## y is one...

The 8th iteration of the DREAM Challenges are underway. DREAM is something like the Kaggle of computational biology with an open science bent. Participating teams apply machine learning and statistical modeling methods to biological problems, competing to achieve the best predictive accuracy. This year's three challenges focus on reverse engineering cancer, toxicology and the kinetics of...

I've been writing software to help others do data analysis for a number of years and at the same time trying to work up my nerve to try my own analysis. Why let other people have all the fun? So, when I saw that Jeffrey Leek, biostatistician at Johns Hopkins and coauthor of Simply Statistics, was teaching...

I've been having some great fun parallelizing R code on Amazon's cloud. Now that things are chugging away nicely, it's time to document my foibles so I can remember not to fall into the same pits of despair again. The goal was to perform lots of trails of a randomized statistical simulation. The jobs were independent and fairly chunky, taking...

