Ever since I heard about Kaggle.com at this year’s Bay Area Data Mining Camp, I’ve wanted to participate. But I was feeling somewhat intimidated.
Jeremy Howard’s “Intro to Kaggle” talk at yesterday’s MeetUp (DataMining for a Cause) was exactly what I needed.
He had a number of tips for beginners. His was exactly the talk that I was looking for, though I didn’t know it. I am sharing some of his tips here, in case it helps others as well.
Jeremy Howard’s Tips for Getting Started on Data Mining competitions at Kaggle
* Visit the Kaggle site and spend at least 30 minutes every day hanging around. Read the forum, the competition pages, and read the Kaggle blog
* It is much better to start participating in competitions which are just starting up, rather than in ones where there are 100s of entries and teams already well on their way
* Aim to make at least one submission each and every day
* Jeremy himself participates in competitions to see where he stands, and to learn and get better
* He’d start out making trivial submissions (all zero’s, or alternate zero’s, all entries as averages) until his algorithm got better
* A lot of people who compete use R (and SAS, Excel or Python)
* Nearly 50% of the winning entries use Random Forest techniques.
* If you place in the top 3, that is great. But personal improvement and learning should be the goal.
* As you get better, you might get invited to “private competitions.”
* Every day, strive to do a little better and improve your submission’s performance, scoring and ranking