**Blog - Applied Predictive Modeling**, and kindly contributed to R-bloggers)

One great thing about R is that has a wide diversity of packages written by many different people of many different viewpoints on how software should be designed. However, this does tend to bite us periodically.

When I teach newcomers about R and predictive modeling, I have a slide that illustrates one of the weaknesses of this system: heterogeneous interfaces. If you are building a classification model and want to generate class probabilities for new samples, the syntax can be… diverse. Here is a sample of syntax for different models:

That’s a lot of minutia to remember. I did a quick and dirty census of all the classification models used by caret to quantify the variability in this particular syntax. The `train`

utilizes 64 different models that can produce class probabilities. Of these, many were from the same package. For example, both `nnet`

and `multinom`

are in the nnet package and probably should not count twice since the latter is a wrapper for the former. As another example, the RWeka packages has at least six functions that all use `probability`

as the value for `type`

.

For this reason, I cooked the numbers down to one value of `type`

per package (using majority vote if there was more than one). There were 40 different packages once these redundancies were eliminated. Here is a histogram of the `type`

values for calculating probabilities:

The most frequent situation is no `type`

value at all. For example, the `lda`

package automatically generated predicted classes and posterior probabilities without requiring the user to specify anything. There were a handful of cases where the class did not have a `predict`

method to generate class probabilities (e.g. party and pamr) and these also counted as “none”.

For those of us that use R to create predictive models on a day-to-day basis, this is a lot of detail to remember (especially if we want to try different models). This is one of the reasons I created caret; it has a unified interface to models that eliminates the need to remember the name of the function, the value of `type`

and any other arguments. In case you are wondering, I chose **`type = “prob”‘**.

**leave a comment**for the author, please follow the link and comment on their blog:

**Blog - Applied Predictive Modeling**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...