Blog Archives

useR! 2014 Highlights

July 3, 2014
By

My talk went well; here are the slides and a link to the paper pre-print. Hadley Wickham gave an excellent tutorial on dplyr. Based on the talk I saw, I think I will take the data sets from the book and make some public visualizations on the Plotly we...

Read more »

New caret version with adaptive resampling

May 28, 2014
By
New caret version with adaptive resampling

A new version of caret is on CRAN now. There are a number of bug fixes: A man page with the list of models available via train was added back into the package. See ?models. Thoralf Mildenberger found and fixed a bug in the variable importance calcu...

Read more »

A Tutorial and Talk at useR! 2014 [Important Update]

May 12, 2014
By

See the update below I'll be doing a morning tutorial at useR! at the end of June in Los Angeles. I've done this same presentation at the last few conferences and this will probably be the last time for this specific workshop. The tutorial outline is: Conventions in R Data splitting and estimating performance Data pre-processing Over-fitting and resampling Training and tuning tree...

Read more »

A Tutorial and Talk at useR! 2014

May 7, 2014
By

I'll be doing a morning tutorial at useR! at the end of June in Los Angeles. I've done this same presentation at the last few conferences and this will probably be the last time for this specific workshop. I will be including a copy of the book for ...

Read more »

Bay Area RUG Talk on 3/17

March 9, 2014
By

I'm making my yearly pilgrimage to San Fransico to teach at PAW. I'll also be giving a short talk at the Bay Area R Users Group on model tags in the caret package and the code that produced this interactive plot. It is at 7:00 PM on Monday March 17...

Read more »

caret webinar materials

February 28, 2014
By

The webinar was recorded (thanks to Ray DiGiacomo and the Orange County RUG). The slides are here minus a few typos. 

Read more »

Optimizing Probability Thresholds for Class Imbalances

February 6, 2014
By
Optimizing Probability Thresholds for Class Imbalances

One of the toughest problems in predictive model occurs when the classes have a severe imbalance. We spend an entire chapter on this subject itself. One consequence of this is that the performance is generally very biased against the class with the smallest frequencies. For example, if the data have a majority of samples belonging to the first...

Read more »

caret webinar on Feb 25

February 2, 2014
By

I"ll be doing a webinar with the Orange County R User Group on the caret package on Tue, Feb 25, 2014 1:00 PM - 2:00 PM EST.Here is the url in case you are interested: https://www3.gotomeeting.com/register/673845982Thanks to Ray DiGiacom...

Read more »

Calibration Affirmation

January 4, 2014
By
Calibration Affirmation

In the book, we discuss the notion of a probability model being "well calibrated". There are many different mathematical techniques that classification models use to produce class probabilities. Some of values are "probability-like" in that they are between zero and one and sum to one. This doesn't necessarily mean that the probability estimates are consistent with the true event...

Read more »

Down-Sampling Using Random Forests

December 8, 2013
By
Down-Sampling Using Random Forests

We discuss dealing with large class imbalances in Chapter 16. One approach is to sample the training set to coerce a more balanced class distribution. We discuss down-sampling: sample the majority class to make their frequencies closer to the rarest class. up-sampling: the minority class is resampled to increase the corresponding frequencies hybrid approaches: some methodologies do a little of both and...

Read more »