Tibshirani’s original paper on the lasso. Breiman’s…

December 6, 2012
By

(This article was first published on isomorphismes, and kindly contributed to R-bloggers)



Tibshirani’s original paper on the lasso.

  • Breiman’s Garotte — 1993
  • Tibshirani lasso paper submitted — 1994
  • Tibshirani lasso paper revised — 1995
  • Tibshirani lasso paper accepted — 1996

This is one of those papers that I’m so excited about, I feel like “You should just read the whole thing! It’s all good!” But I realise that’s less than reasonable.

Here is a bit of summary, feel free to request other information and I’ll do my best to adapt it.

The basic question is: I have some data and I want the computer to generate (regress) a linear model of it for me. What procedure should I tell the computer to do to get a good | better | best model?

The first technique, by Abraham de Moivre, applied fruitfully by Gauss in the late 1800’s (so, ok, no computers then — but nowadays we just run lm in R), was to minimise the sum of squared error (Euclidean distance)
image
of a given affine model of the data. (Affine being linear + one more parameter for a variable origin, to account for the average value of the data ex observable parameters. For example to model incomes in the USA when the only observed parameters are age, race, and zip code—you would want to include the average baseline US income level, and that would be accomplished mathematically by shifting the origin, a.k.a. the alpha; or autonomous or “vector of ones” regression-model parameter, a.k.a. the affine addition to an otherwise linear model.)

It was noticed by several someones at various points in time that whilst de Moivre’s least-squares (OLS) method is provably (calculus of variations) the optimal linear model given well-behaved data, real data does not always behave.

http://bookcoverarchive.com/images/books/wellbehaved_women_seldom_make_history.large.jpg

In the presence of correlation, missing data, wrong data, and other problems, the “optimal” OLS solution is overfit, meaning that the model it makes for you picks up on too many of the problems. Is there a way to pick up on more signal and less noise? More gold and less dross? More of the real stuff and fewer of the impurities?

I can think of 2 ways people have tried to scrape off the corrosion without flaying as well too much of the underlying good-material:

  1. Assume simpler models are better. This is the approach taken by ridge regression (a.k.a. Tikhonov regularisation a.k.a. penalisation), the lasso, and the garotte
    ridge regression with tuning parameter highlighted
  2. Compare ensembles of models, then choose one in the “middle”. Robust methods, for example, use statistical functions that vary less in theory from flawed situation to flawed situation, than do other statistical functions. Subset selection, hierarchical methods, and  generate a lot of models on the real data and 
 

That’s the backstory. Now on to what Tibshirani actually says. His original lasso paper contrasts 3 ways of penalising complicated models, plus regression on subsets.

The three formulae:

  • penalisation & restriction to subsets
    Ridge Regression: edited it again to make the lambda term specifically look like the tuning parameter for the penalty
    Tibshirani 3
    definition of ’hat_j^o is the OLS magnitudes
  • garotte
    Leo Breiman garotte
  • lasso
    Tibshirani lasso def

look superficially quite similar. Tibshirani discusses the pro’s, con’s, when’s, and wherefore’s of the different approaches.

Tibshirani lasso 1

 

(In reading this paper I learned a new symbol, the operator ƒ(x) = (x)⁺. It means
’(x) = (x)⁺
and looks like
’(x) = (x)⁺
). In R code, ifelse(x<0, 0, x). Like absolute value but not exactly.)

 

Back to the lasso. How does such a small change to the penalty function, change the estimated linear model we get as output?

http://i.imgur.com/VJiAh.png
http://i.imgur.com/zeBrm.png

http://i.imgur.com/tqOim.png
http://i.imgur.com/zeBrm.png

To leave a comment for the author, please follow the link and comment on his blog: isomorphismes.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.