ML models: What they can’t learn?

May 20, 2018
By

(This article was first published on English – SmarterPoland.pl, and kindly contributed to R-bloggers)

What I love in conferences are the people, that come after your talk and say: It would be cool to add XYZ to your package/method/theorem.

After the eRum (great conference by the way) I was lucky to hear from Tal Galili: It would be cool to use DALEX for teaching, to show how different ML models are learning relations.

Cool idea. So let’s see what can and what cannot be learned by the most popular ML models. Here we will compare random forest against linear models against SVMs.
Find the full example here. We simulate variables from uniform U[0,1] distribution and calculate y from following equation

In all figures below we compare PDP model responses against the true relation between variable x and the target variable y (pink color). All these plots are created with DALEX package.

For x1 we can check how different models deal with a quadratic relation. The linear model fails without prior feature engineering, random forest is guessing the shape but the best fit if found by SVMs.

With sinus-like oscillations the story is different. SVMs are not that flexible while random forest is much closer.

Turns out that monotonic relations are not easy for these models. The random forest is close but event here we cannot guarantee the monotonicity.

The linear model is the best one when it comes to truly linear relation. But other models are not that far.

The abs(x) is not an easy case for neither model.

Find the R codes here.

Of course the behavior of all these models depend on number of observation, noise to signal ratio, correlation among variables and interactions.
Yet is may be educational to use PDP curves to see how different models are learning relations. What they can grasp easily and what they cannot.

To leave a comment for the author, please follow the link and comment on their blog: English – SmarterPoland.pl.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)