**isomorphismes**, and kindly contributed to R-bloggers)

*Tibshiranis original paper on the lasso.*

- Breimans Garotte 1993
- Tibshirani lasso paper submitted 1994
- Tibshirani lasso paper revised 1995
- Tibshirani lasso paper accepted 1996

This is one of those papers that Im so excited about, I feel like You should just read the whole thing! Its all good! But I realise thats less than reasonable.

Here is a bit of summary, feel free to request other information and Ill do my best to adapt it.

The basic question is: I have some data and I want the computer to generate (regress) a linear model of it for me. What procedure should I tell the computer to do to get a good | better | best model?

The first technique, by Abraham de Moivre, applied fruitfully by Gauss in the late 1800s (so, ok, no computers then but nowadays we just run `lm`

in `R`

), was to minimise the sum of squared error (Euclidean distance)

of a given affine model of the data. (Affine being `linear + one more parameter`

for a variable origin, to account for the average value of the data ex observable parameters. For example to model incomes in the USA when the only observed parameters are age, race, and zip codeyou would want to include the average baseline US income level, and that would be accomplished mathematically by shifting the origin, a.k.a. the alpha; or autonomous or vector of ones regression-model parameter, a.k.a. the affine addition to an otherwise linear model.)

It was noticed by several someones at various points in time that whilst de Moivres least-squares (OLS) method is provably (calculus of variations) the optimal linear model given well-behaved data, real data does not always behave.

In the presence of correlation, missing data, wrong data, and other problems, the optimal OLS solution is overfit, meaning that the model it makes for you picks up on too many of the problems. Is there a way to pick up on more signal and less noise? More gold and less dross? More of the real stuff and fewer of the impurities?

I can think of 2 ways people have tried to scrape off the corrosion without flaying as well too much of the underlying good-material:

**Assume**simpler models are better. This is the approach taken by ridge regression (a.k.a. Tikhonov regularisation a.k.a. penalisation), the lasso, and the garotte- Compare
**ensembles**of models, then choose one in the middle. Robust methods, for example, use statistical functions that vary less in theory from flawed situation to flawed situation, than do other statistical functions. Subset selection, hierarchical methods, and generate a lot of models on the real data and

Thats the backstory. Now on to what Tibshirani actually says. His original lasso paper contrasts 3 ways of penalising complicated models, plus regression on subsets.

The three formulae:

- penalisation & restriction to subsets
- garotte
- lasso

look superficially quite similar. Tibshirani discusses the pros, cons, whens, and wherefores of the different approaches.

(In reading this paper I learned a new symbol, the operator (x) = (x)?. It means

and looks like

). In `R`

code, `ifelse(x<0, 0, x)`

. Like absolute value but not exactly.)

Back to the lasso. How does such a small change to the penalty function, change the estimated linear model we get as output?

**leave a comment**for the author, please follow the link and comment on their blog:

**isomorphismes**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...