The Kaggle Bug

[This article was first published on Intelligent Trading, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

If you have any interest in data mining and machine learning, you might have already caught the Kaggle bug.

I myself fairly recently got caught up in following the various contests and forums after reading a copy of “Practical Time Series Forecasting,” — 2nd edition, by
Galit Shmueli. What makes the contests great are that they allow any ambitious and creative data scientist or amateur enthusiast to participate in and learn a wealth of new knowledge and tricks from more experienced professionals in the field.

What should make it even more interesting to readers here is considering that many of the winners that participate in these high purse contests are often from the financial world. Take one of my personally inspirational traders, Jaffray Woodriff, hedge fund manager of well-known machine learning oriented hedge fund, Quantitative Investment Management (better known by its acronym – QIM). I had mentioned recently to a surprised friend, that Mr. Woodriff had also participated in the more well-known Netflix prediction contest (having been a member of the third-place team at one point).

In particular, the most recent contest that has many eager followers watching is the $3,000,000 Heritage Provider healthcare claims contests, which is an open contest to predict likelihood of patient hospital admission. What particularly inspired this blog post is a very useful blog from one of the leading contestants, Phil Brierley a.k.a. handle, Sali Mali, who has interestingly joined with the marketmaker team, also affiliated with a prediction related fund. Mr. Brierley has shared tremendously useful insights about practical methods of attacking the problem– all the way from SQL preprocessing and cleaning to intuitive visualization methodologies. I applaud him for his generous sharing of insights to the rest of the predictive analytics community.  Although he hasn’t posted in a while, his journal of thoughts are still highly useful.

Anyone looking for grubstake could certainly use three million to get started=)

Below are the specific links mentioned…

…and one newer from stack exchange


To leave a comment for the author, please follow the link and comment on their blog: Intelligent Trading. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)