Blog Archives

A Talk and Course in NYC Next Week

February 13, 2015
By
A Talk and Course in NYC Next Week

I'll be giving talk on Tuesday February 17 (7:00PM-9:00PM) that will be an overview of predictive modeling. It will not be highly technical and here is the current outline: "Predictive modeling" definition Some example applications A short overview and example How is this different from what statisticians already do? What can drive choice of methodology? Where should we focus our efforts? The location...

Read more »

Simulated Annealing Feature Selection

January 12, 2015
By
Simulated Annealing Feature Selection

As previously mentioned, caret has two new feature selection routines based on genetic algorithms (GA) and simulated annealing (SA). The help pages for the two new functions give a detailed account of the options, syntax etc. The package already has functions to conduct feature selection using simple filters as well as recursive feature elimination (RFE). RFE...

Read more »

Regression Solutions Available

January 8, 2015
By
Regression Solutions Available

The github page for the APM exercises has been updated with three new files for Chapters 6-8 (the section on regression). The classifications section is in-progress. Here's one of our fancy-pants graphs:

Read more »

New Version of caret on CRAN

January 5, 2015
By
New Version of caret on CRAN

A new version of caret is on CRAN. Some recent features/changes: The license was changed to GPL >= 2 to accommodate new code from the GA package. New feature selection functions gafs and safs were adde...

Read more »

Comparing the Bootstrap and Cross-Validation

December 8, 2014
By
Comparing the Bootstrap and Cross-Validation

This is the second of two posts about the performance characteristics of resampling methods. The first post focused on the cross-validation techniques and this post mostly concerns the bootstrap. Recall from the last post: we have some simulations to evaluate the precision and bias of these methods. I simulated some regression data (so that I know the real...

Read more »

Comparing Different Species of Cross-Validation

December 2, 2014
By
Comparing Different Species of Cross-Validation

This is the first of two posts about the performance characteristics of resampling methods. I just had major shoulder surgery, but I've pre-seeded a few blog posts. More will come as I get better at one-handed typing. First, a review: Resampling methods, such as cross-validation (CV) and the bootstrap, can be used with predictive models to get estimates of model...

Read more »

Solutions on github

November 12, 2014
By

See this page. We're not done with them all but chapter 3 and 4 are there and the regression chapters are not too far behind. The Rnw files (using knitr LaTeX) are there along with the corresponding pdf files. You may have better solutions than ...

Read more »

Some Thoughts on “Do we Need Hundreds of Classifiers to Solve Real World Classification Problems?”

November 11, 2014
By

Sorry for the blogging break. I’ve got a few planned for the next few weeks based on some work I’ve been doing. In the meantime, you should check out “Do we Need Hundreds of Classifiers to Solve Real World Classification Problems?” by Manuel Fernandez-Delgado at JMLR. They took a large number of classifiers and ran them against...

Read more »

useR! 2014 Highlights

July 3, 2014
By

My talk went well; here are the slides and a link to the paper pre-print. Hadley Wickham gave an excellent tutorial on dplyr. Based on the talk I saw, I think I will take the data sets from the book and make some public visualizations on the Plotly we...

Read more »

New caret version with adaptive resampling

May 28, 2014
By
New caret version with adaptive resampling

A new version of caret is on CRAN now. There are a number of bug fixes: A man page with the list of models available via train was added back into the package. See ?models. Thoralf Mildenberger found and fixed a bug in the variable importance calcu...

Read more »