A Tutorial and Talk at useR! 2014

Posted on May 7, 2014 by Max Kuhn in R bloggers | 0 Comments

[This article was first published on Blog - Applied Predictive Modeling, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I’ll be doing a morning tutorial at useR! at the end of June in Los Angeles. I’ve done this same presentation at the last few conferences and this will probably be the last time for this specific workshop.

I will be including a copy of the book for those who take the tutorial and all the proceeds (minus book costs) will be donated to the Foundation for Open Access Statistics (FOAS).

The tutorial outline is:

Conventions in R
Data splitting and estimating performance
Data pre-processing
Over-fitting and resampling
Training and tuning tree models
Training and tuning a support vector machine
Comparing models (as time allows)
Parallel processing (as time allows)

I’m also giving a talk called “Adaptive Resampling in a Parallel World“:

Many predictive models require parameter tuning. For example, a classification tree requires the user to specify the depth of the tree. This type of “meta parameter” or “tuning parameter” cannot be estimated directly from the training data. Resampling (e.g. cross-validation or the bootstrap) is a common method for finding reasonable values of these parameters (Kuhn and Johnson, 2013). Suppose B resamples are used with M candidate values of the tuning parameters. This can quickly increase the computational complexity of the task. Some of the M models could be disregarded early in the resampling process due to poor performance. Maron and Moore (1997) and Shen el at (2011) describe methods to adaptively filter which models are evaluated during resampling and reducing the total number of model fits. However, model parameter tuning is an “embarrassingly parallel” task; model fits can be calculated across multiple cores or machines to reduce the total training time. With the availability of parallel processing is it still advantageous to adaptively resample?

This talk will briefly describe adaptive resampling methods and characterize their effectiveness using parallel processing via simulations.

To leave a comment for the author, please follow the link and comment on their blog: Blog - Applied Predictive Modeling.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

A Tutorial and Talk at useR! 2014

Related

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)