Articles by Max Kuhn

A Tutorial and Talk at useR! 2014 [Important Update]

May 12, 2014 | Max Kuhn

See the update below I'll be doing a morning tutorial at useR! at the end of June in Los Angeles. I've done this same presentation at the last few conferences and this will probably be the last time for this specific workshop. The tutorial outline is: Conventions in R Data ... [Read more...]

A Tutorial and Talk at useR! 2014

May 7, 2014 | Max Kuhn

I'll be doing a morning tutorial at useR! at the end of June in Los Angeles. I've done this same presentation at the last few conferences and this will probably be the last time for this specific workshop. I will be including a copy of the book for ... [Read more...]

Bay Area RUG Talk on 3/17 (updated)

March 9, 2014 | Max Kuhn

I'm making my yearly pilgrimage to San Fransico to teach at PAW. I'll also be giving a short talk at the Bay Area R Users Group on model tags in the caret package and the code that produced this interactive plot. It is at 7:00 PM on Monday March 17th at ... [Read more...]

Bay Area RUG Talk on 3/17

March 9, 2014 | Max Kuhn

I'm making my yearly pilgrimage to San Fransico to teach at PAW. I'll also be giving a short talk at the Bay Area R Users Group on model tags in the caret package and the code that produced this interactive plot. It is at 7:00 PM on Monday March 17... [Read more...]

Optimizing Probability Thresholds for Class Imbalances

February 6, 2014 | Max Kuhn

One of the toughest problems in predictive model occurs when the classes have a severe imbalance. We spend an entire chapter on this subject itself. One consequence of this is that the performance is generally very biased against the class with the smallest frequencies. For example, if the data have ... [Read more...]

caret webinar on Feb 25

February 2, 2014 | Max Kuhn

I"ll be doing a webinar with the Orange County R User Group on the caret package on Tue, Feb 25, 2014 1:00 PM - 2:00 PM EST.Here is the url in case you are interested: https://www3.gotomeeting.com/register/673845982Thanks to Ray DiGiacom... [Read more...]

Calibration Affirmation

January 4, 2014 | Max Kuhn

In the book, we discuss the notion of a probability model being "well calibrated". There are many different mathematical techniques that classification models use to produce class probabilities. Some of values are "probability-like" in that they are between zero and one and sum to one. This doesn't necessarily mean that ... [Read more...]

Down-Sampling Using Random Forests

December 8, 2013 | Max Kuhn

We discuss dealing with large class imbalances in Chapter 16. One approach is to sample the training set to coerce a more balanced class distribution. We discuss down-sampling: sample the majority class to make their frequencies closer to the rarest class. up-sampling: the minority class is resampled to increase the corresponding ... [Read more...]

The Basics of Encoding Categorical Data for Predictive Models

October 23, 2013 | Max Kuhn

Thomas Yokota asked a very straight-forward question about encodings for categorical predictors: "Is it bad to feed it non-numerical data such as factors?" As usual, I will try to make my answer as complex as possible. (I've heard the old wives tale that eskimos have 180 different words in their language ... [Read more...]

Equivocal Zones

August 16, 2013 | Max Kuhn

In Chapter 11, equivocal zones were briefly discussed. The idea is that some classification errors are close to the probability boundary (i.e. 50% for two class outcomes). If this is the case, we can create a zone where we the samples are predicted as "equivocal" or "indeterminate" instead of one of ... [Read more...]

UseR! 2013 Highlights

July 13, 2013 | Max Kuhn

The conference was excellent this year. My highlights: Bojan Mihaljevic gave a great presentation on machine learning models built from network models. Their package isn't on CRAN yet, but I'm really looking forward to it. Jim Harner's presentation ... [Read more...]

Measuring Associations

June 20, 2013 | Max Kuhn

In Chapter 18, we discuss a relatively new method for measuring predictor importance called the maximal information coefficient (MIC). The original paper is by Reshef at al (2011). A summary of the initial reactions to the MIC are Speed and Tibshirani (and others can be found here). My (minor) beef with it ... [Read more...]

type = “what?”

June 13, 2013 | Max Kuhn

One great thing about R is that has a wide diversity of packages written by many different people of many different viewpoints on how software should be designed. However, this does tend to bite us periodically.  When I teach newcomers about R and... [Read more...]

Feature Selection 3 – Swarm Mentality

June 6, 2013 | Max Kuhn

"Bees don't swarm in a mango grove for nothing. Where can you see a wisp of smoke without a fire?" - Hla Stavhana In the last two posts, genetic algorithms were used as feature wrappers to search for more effective subsets of predictors. Here, I will do the same with ... [Read more...]

Recent Changes to caret

May 18, 2013 | Max Kuhn

Here is a summary of some recent changes to caret. Feature Updates: train was updated to utilize recent changes in the gbm package that allow for boosting with three or more classes (via the multinomial distribution) The Yeo-Johnson power transformation was added. This is very similar to the Box-Cox transformation, ... [Read more...]

Projection Pursuit Classification Trees

May 14, 2013 | Max Kuhn

I've been looking at this article for a new tree-based method. It uses other classification methods (e.g. LDA) to find a single variable use in the split and builds a tree in that manner. The subtleties of the model are: The model does not prune but ... [Read more...]

Feature Selection 2 – Genetic Boogaloo

May 8, 2013 | Max Kuhn

Previously, I talked about genetic algorithms (GA) for feature selection and illustrated the algorithm using a modified version of the GA R package and simulated data. The data were simulated with 200 non-informative predictors and 12 linear effects and three non-linear effects. Quadratic discriminant analysis (QDA) was used to model the data. ... [Read more...]

Feature Selection Strikes Back (Part 1)

April 29, 2013 | Max Kuhn

In the feature selection chapter, we describe several search procedures ("wrappers") that can be used to optimize the number of predictors. Some techniques were described in more detail than others. Although we do describe genetic algorithms and how they can be used for reducing the dimensions of the data, this ... [Read more...]
1 2 3 4

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)