Tibshiranis original paper on the lasso. Breimans…

December 6, 2012

(This article was first published on isomorphismes, and kindly contributed to R-bloggers)

Tibshiranis original paper on the lasso.

  • Breimans Garotte 1993
  • Tibshirani lasso paper submitted 1994
  • Tibshirani lasso paper revised 1995
  • Tibshirani lasso paper accepted 1996

This is one of those papers that Im so excited about, I feel like You should just read the whole thing! Its all good! But I realise thats less than reasonable.

Here is a bit of summary, feel free to request other information and Ill do my best to adapt it.

The basic question is: I have some data and I want the computer to generate (regress) a linear model of it for me. What procedure should I tell the computer to do to get a good | better | best model?

The first technique, by Abraham de Moivre, applied fruitfully by Gauss in the late 1800s (so, ok, no computers then but nowadays we just run lm in R), was to minimise the sum of squared error (Euclidean distance)
of a given affine model of the data. (Affine being linear + one more parameter for a variable origin, to account for the average value of the data ex observable parameters. For example to model incomes in the USA when the only observed parameters are age, race, and zip codeyou would want to include the average baseline US income level, and that would be accomplished mathematically by shifting the origin, a.k.a. the alpha; or autonomous or vector of ones regression-model parameter, a.k.a. the affine addition to an otherwise linear model.)

It was noticed by several someones at various points in time that whilst de Moivres least-squares (OLS) method is provably (calculus of variations) the optimal linear model given well-behaved data, real data does not always behave.


In the presence of correlation, missing data, wrong data, and other problems, the optimal OLS solution is overfit, meaning that the model it makes for you picks up on too many of the problems. Is there a way to pick up on more signal and less noise? More gold and less dross? More of the real stuff and fewer of the impurities?

I can think of 2 ways people have tried to scrape off the corrosion without flaying as well too much of the underlying good-material:

  1. Assume simpler models are better. This is the approach taken by ridge regression (a.k.a. Tikhonov regularisation a.k.a. penalisation), the lasso, and the garotte
    ridge regression with tuning parameter highlighted
  2. Compare ensembles of models, then choose one in the middle. Robust methods, for example, use statistical functions that vary less in theory from flawed situation to flawed situation, than do other statistical functions. Subset selection, hierarchical methods, and generate a lot of models on the real data and

Thats the backstory. Now on to what Tibshirani actually says. His original lasso paper contrasts 3 ways of penalising complicated models, plus regression on subsets.

The three formulae:

  • penalisation & restriction to subsets
    Ridge Regression: edited it again to make the lambda term specifically look like the tuning parameter for the penalty
    Tibshirani 3
    definition of ?hat_j^o is the OLS magnitudes
  • garotte
    Leo Breiman garotte
  • lasso
    Tibshirani lasso def

look superficially quite similar. Tibshirani discusses the pros, cons, whens, and wherefores of the different approaches.

Tibshirani lasso 1

(In reading this paper I learned a new symbol, the operator (x) = (x)?. It means
?(x) = (x)?
and looks like
?(x) = (x)?
). In R code, ifelse(x<0, 0, x). Like absolute value but not exactly.)

Back to the lasso. How does such a small change to the penalty function, change the estimated linear model we get as output?



To leave a comment for the author, please follow the link and comment on their blog: isomorphismes.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)