Blog Archives

Parallel random forests using foreach

July 22, 2011
By
Parallel random forests using foreach

There's been some discussion on the kaggle forums and on a few blogs about various ways to parallelize random forests, so I thought I'd add my thoughts on the issue.Here's my version of the 'parRF' function, which is based on the elegant version in the...

Read more »

Importing google news data to R

July 6, 2011
By
Importing google news data to R

I've been playing around lately with the stock market data available from google finance, through quantmod in R. Here's a function I've written (which depends on the R Data Science Toolkit), to pull news stories related to a stock from google, parse t...

Read more »

Kaggle Competition Walkthrough: Wrapup

June 1, 2011
By
Kaggle Competition Walkthrough: Wrapup

The Kaggle Don't Overfit competition is over, and I took 11th place! Additionally, I tied with tks for contributing the most to the forum, so thanks to everyone who voted for me! I voted for tks, and I'm very happy to share the prize with him, as most...

Read more »

Kaggle Competition Walkthrough: Fitting a model

May 12, 2011
By
Kaggle Competition Walkthrough: Fitting a model

Now that we've got the data we need into R, it is very easy to fit a model using the caret package. Caret's workhorse function is called 'train,' and it allows you to fit a wide variety of models using the same syntax. Furthermore, many models have '...

Read more »

Kaggle Competition Walkthrough: Introduction

May 3, 2011
By
Kaggle Competition Walkthrough: Introduction

Kaggle is a site for participating in predictive analytics competitions. It is also a great resource for learning how to build powerful predictive models, and the Overfitting competition provides a good introduction to the common tools used by a predic...

Read more »

Parallelizing and cross-validating feature selection in R

April 29, 2011
By
Parallelizing and cross-validating feature selection in R

This is an example piece of code for the Overfitting competition at kaggle.com. This method has an AUC score of ~.91, which is currently good enough for about 38th place on the leaderboard. If you read the completion forums closely, you will find code...

Read more »

Intro

April 22, 2011
By
Intro

This blog will show you how to build tools to survive in the modern world. I will focus on statistics and machine learning, because that's where my strengths lie, but sometime we may find ourselves veering far off course.My primary interest lies in us...

Read more »