Blog Archives

Do Resampling Estimates Have Low Correlation to the Truth? The Answer May Shock You.

April 23, 2017
By
Do Resampling Estimates Have Low Correlation to the Truth? The Answer May Shock You.

One criticism that is often leveled against using resampling methods (such as cross-validation) to measure model performance is that there is no correlation between the CV results and the true error rate. Let's look at this with some simulated data. W...

Read more »

Working at RStudio

November 28, 2016
By
Working at RStudio

...

Read more »

2016 UK Tour

September 26, 2016
By
2016 UK Tour

I'll be in the UK next week doing three talks in three days: First, I'll be giving a talk at the London R-Ladies meetup on Monday October 3rd with perhaps the best title yet: Whose Scat Is That? An 'Easil...

Read more »

DataCamp Course

September 26, 2016
By
DataCamp Course

Zachary Deane-Mayer, who collaborates on caret, has put together a DataCamp course on Machine Learning in R. Zach and DataCamp did a great job of developing a course that is just right for people who are ...

Read more »

Boston R User Group Talk [UPDATE]

March 4, 2016
By
Boston R User Group Talk [UPDATE]

I'll be giving a talk on Boston R user Group on Thursday March 10th at 6:00 PM. The talk will be on rule-based regression models. The image above is the training/test set split for the data that I'll be us...

Read more »

Central Iowa R User Group Talk [Updated]

January 18, 2016
By
Central Iowa R User Group Talk [Updated]

I'll be giving a talk ("Applied Predictive Modeling") to the Central Iowa R User Group on Thursday night at 6:00 PM to 8:00 PM (CST). It looks like it will be broadcast live on YouTube. The link is http://...

Read more »

In Search Of…

December 13, 2015
By
In Search Of…

Rafael Ladeira asked on github: I was wondering why it doesn't implement some others algorithms for search for optimal tuning parameters. What would be the caveats of using a genetic algorithm , for instance, instead of grid or random search? Do y...

Read more »

C5.0 Class Probability Shrinkage

September 14, 2015
By
C5.0 Class Probability Shrinkage

(The image above has nothing do to with this post. It does, however, show the prize that my daughter won during a recent vacation to Virginia and how I got it back home). I was recently asked to explain a potential disconnect in C5.0 between the class probabilities shown in the terminal nodes and the values generated...

Read more »

Feature Engineering versus Feature Extraction: Game On!

August 3, 2015
By
Feature Engineering versus Feature Extraction: Game On!

"Feature engineering" is a fancy term for making sure that your predictors are encoded in the model in a manner that makes it as easy as possible for the model to achieve good performance. For example, if your have a date field as a predictor and there are larger differences in response for the weekends versus the weekdays, then...

Read more »

New caret Version (6.0-52)

July 22, 2015
By

A new version of caret (6.0-52) is on CRAN. Here is the news file but the Cliff Notes are: sub-sampling for class imbalances is now integrated with train and is used inside of standard resampling. There are four methods available right now: up- and...

Read more »

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)