Articles by jmount

abs and relu are not Mercer Kernels

December 25, 2020 | jmount

I am sharing some rough notes (in R and Python) here on how while dot(a, b) fulfills “Mercer’s condition” (by definition!, and I’ll just informally call these beasts a “Mercer Kernel”), the seemingly harmless variations abs(dot(a, b)) relu(dot(a, b)) are not Mercer Kernels (... [Read more...]

Bounding Excess Generalization Error

December 23, 2020 | jmount

I am sharing a new free video where I work through a great common argument that bounds expected excess generalization error as a ratio of model complexity (in rows) over training set size (again in rows), independent of problem dimension. (link) For more of my notes on support vector machines […] [Read more...]

What Every Data Scientist Should Know About Floating Point

December 14, 2020 | jmount

What Every Data Scientist Should Know About Floating Point (link) [Read more...]

Some Fun With User/Package Level Pipes/Anonymous-Functions

December 8, 2020 | jmount

In addition to adding a base-R pipe it appears a new base-R function builders is in the works (in addition to “function”). R is a very versatile language, with a great ability to accept user-level or package extensions. What I mean by this is, user code and package code (which […] [Read more...]

My Opinion on R’s Upcoming Pipe

December 7, 2020 | jmount

R‘s upcoming pipe appears to be currently proposed as a syntactic transform of the form: a |__ f(...) -__ f(a, ...) a |__ f() -__ f(a) There is a current active discussion on this prototype and some interesting points come up. Note the current proposal appears to disallow a |__ […] [Read more...]

R is Getting an Official Pipe Operator

December 5, 2020 | jmount

It looks like R is getting an official pipe operator (ref). R doesn’t work under an RFC process, so we hear about these things and they are discussed on the R-devel mailing list. I’ve written on this topic before (ref), and I have taped some new comments. (link) ... [Read more...]

Happy Anniversary Practical Data Science with R 2nd Edition!

December 3, 2020 | jmount

Our book, Practical Data Science with R, just had its first year anniversary! The book is doing great, if you are working with R and data I recommend you check it out. (link) [Read more...]

The Purpose of our Data Science Chalk Talk Series

November 22, 2020 | jmount

I’d like to share an introduction to my data science chalk talk series (video link, series link) [Read more...]

New Light-board Lecture: wrapr::unpack

November 13, 2020 | jmount

[Read more...]

BARUG ROC day invitation

November 4, 2020 | jmount

I’ve recorded a video invitation to help encourage you to consider attending BARUG’s online ROC day (Tuesday, November 10, 2020 4:30 PM US Pacific time). Please check it out and share. (link) [Read more...]

A Single Parameter Family Characterizing Probability Model Performance

October 29, 2020 | jmount

Introduction We’ve been writing on the distribution density shapes expected for probability models in ROC (receiver operator characteristic) plots, double density plots, and normal/logit-normal densities frameworks. I thought I would re-approach the issue with a specific family of examples. Definitions Let’s define a “probability model” as a ...

[Read more...]

An Example of a Calibrated Model that is not Fully Calibrated

October 28, 2020 | jmount

In our last note we mentioned the possibility of “fully calibrated models.” This note is an example of a probability model that is calibrated in the traditional sense, but not fully calibrated in a finer grained sense. First let’s attach our packages and generate our example data in R. ... [Read more...]

The Double Density Plot Contains a Lot of Useful Information

October 27, 2020 | jmount

The double density plot contains a lot of useful information. This is a plot that shows the distribution of a continuous model score, conditioned on the binary categorical outcome to be predicted. As with most density plots: the y-axis is an abstract quantity called density picked such that the area […]

[Read more...]

Your Lopsided Model is Out to Get You

October 26, 2020 | jmount

For classification problems I argue one of the biggest steps you can take to improve the quality and utility of your models is to prefer models that return scores or return probabilities instead of classification rules. Doing this also opens a second large opportunity for improvement: working with your domain […] [Read more...]

The Shift and Balance Fallacies

October 15, 2020 | jmount

Two related fallacies I see in machine learning practice are the shift and balance fallacies (for an earlier simple fallacy, please see here). They involve thinking logistic regression has a bit simpler structure that it actually does, and also thinking logistic regression is a bit less powerful than it actually […] [Read more...]

Surgery on ROC Plots

October 13, 2020 | jmount

This note is a little break from our model homotopy series. I have a neat example where one combines two classifiers to get a better classifier using a method I am calling “ROC surgery.” In ROC surgery we look at multiple ROC plots and decide we want to cut out […]

[Read more...]

Model Homotopies in the Wild

October 12, 2020 | jmount

So are model homotopies commonly used? Yes, they are. As an example consider glmnet: Jerome Friedman, Trevor Hastie, Robert Tibshirani (2010). Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1), 1-22. URL http://www.jstatsoft.org/v33/i01/. From help(glmnet): library(glmnet) x = matrix(rnorm(100 * 20), 100, 20) g2 = […]

[Read more...]

Tailored Models are Not The Same as Simple Corrections

October 11, 2020 | jmount

Let’s take a stab at our first note on a topic that pre-establishing the definitions of probability model homotopy makes much easier to write. In this note we will discuss tailored probability models. There are models deliberately fit to training data that has an outcome prevalence equal to the ... [Read more...]

How to Pick an Optimal Utility Threshold Using the ROC Plot

October 10, 2020 | jmount

Nina Zumel just completed an excellent short sequence of articles on picking optimal utility thresholds to convert a continuous model score for a classification problem into a deployable classification rule. Squeezing the Most Utility from Your Models Estimating Uncertainty of Utility Curves This is very compatible with our advice to […]

[Read more...]

Data Science is a Science (Just Not the One You May Think)

September 10, 2020 | jmount

I am working on a promising new series of notes: common data science fallacies and pitfalls. (Probably still looking for a good name for the series!) I thought I would share a few thoughts on it, and hopefully not jinx it too badly. Data science, is for better or worse, […] [Read more...]

« 1 2 3 »

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Articles by jmount

abs and relu are not Mercer Kernels

Bounding Excess Generalization Error

What Every Data Scientist Should Know About Floating Point

Some Fun With User/Package Level Pipes/Anonymous-Functions

My Opinion on R’s Upcoming Pipe

R is Getting an Official Pipe Operator

Happy Anniversary Practical Data Science with R 2nd Edition!

The Purpose of our Data Science Chalk Talk Series

New Light-board Lecture: wrapr::unpack

BARUG ROC day invitation

A Single Parameter Family Characterizing Probability Model Performance

An Example of a Calibrated Model that is not Fully Calibrated

The Double Density Plot Contains a Lot of Useful Information

Your Lopsided Model is Out to Get You

The Shift and Balance Fallacies

Surgery on ROC Plots

Model Homotopies in the Wild

Tailored Models are Not The Same as Simple Corrections

How to Pick an Optimal Utility Threshold Using the ROC Plot

Data Science is a Science (Just Not the One You May Think)

Articles by jmount

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)