material of talks at RUG meetings (video, slides, code)

How to build a world-beating predictive model using R

Many modern data analysis problems in both industry and academia involve building a model that can predict the future based on historical variables. The 2009 KDD Cup was an international data mining competition devoted to this type of problem, where contestants attempted to predict the behaviour of mobile phone customers using an extensive database of historical information. The University of Melbourne team managed to win one part of this challenge, using R almost exclusively. In this talk I’ll give some background to the area and the specific problem, and discuss how we went about building our models. The talk will be fairly accessible, and deal with many of the practical issues encountered in this type of work.

6 thoughts on “How to build a world-beating predictive model using R”

“The analysis and modelling work was performed almost entirely in the free open source program R. We say \almost”, because the original data chunks were too large to be read into R with our limited hardware, so it was first read into SAS and exported in batches of 200 variables, each of which could then be read into and then deleted from R” ?

I’m curious how this part was done. If you are only batcthing the variables 200 at a time how do you solve for the interaction of all of the variables? Is there a learning or gradient descent optimization being performed?

“The analysis and modelling work was performed almost entirely in the free open source program R. We say \almost”, because the original data chunks were too large to be read into R with our limited hardware, so it was first read into SAS and exported in batches of 200 variables, each of which could then be read into and then deleted from R” ?

I’m curious how this part was done. If you are only batcthing the variables 200 at a time how do you solve for the interaction of all of the variables? Is there a learning or gradient descent optimization being performed?

Thanks for this interesting presentation. Unfortionately the audio was terrible, and listening was a real pain in the a…

Yes, the audio makes it unwatchable. But the talk could be very interesting — could you post a less compressed version?

There are a couple of presentations available as PDF as well.

http://jmlr.csail.mit.edu/proceedings/papers/v7/

This is the specific to Uni of Melb.

http://jmlr.csail.mit.edu/proceedings/papers/v7/miller09/miller09.pdf

http://www.kddcup-orange.com/Slides/Unimelb_slides.pdf

TL;DR