Parallelizing and cross-validating feature selection in R

April 29, 2011

(This article was first published on Modern Tool Making, and kindly contributed to R-bloggers)

This is an example piece of code for the Overfitting competition at This method has an AUC score of ~.91, which is currently good enough for about 38th place on the leaderboard. If you read the completion forums closely, you will find code that is good enough to tie for 25th place, as well as hints as to how to break into the top 10.

However, I like this script because it does 2 tricky things well, without over fitting:
1. It selects features, despite the curse of dimensionality (250 observations, 200 features)
2. It fits a linear model, using the elastic net.

In future posts, I will walk you through how this code works, but for now, download the data and give it a shot!

To leave a comment for the author, please follow the link and comment on their blog: Modern Tool Making. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)