Varian on big data

June 15, 2014

(This article was first published on Hyndsight » R, and kindly contributed to R-bloggers)

Last week my research group discussed Hal Varian’s interesting new paper on “Big data: new tricks for econometrics”, Journal of Economic Perspectives, 28(2): 3–28.

It’s a nice introduction to trees, bagging and forests, plus a very brief entree to the LASSO and the elastic net, and to slab and spike regression. Not enough to be able to use them, but ok if you’ve no idea what they are.

It was more disappointing on boosting (completely ignoring the fact that boosting can be applied in a regression context as well as a classification context), and his comments on causality seemed curiously naive. His suggested approach involved forecasting using all variables but the one that is considered causal, and then comparing the results against what actually happened. That seems at least as likely to lead to false conclusions on causality as instrumental variables or differences-in-differences. Although Varian cites Pearl’s work approvingly, I doubt that Pearl would return the favour.

On a positive note, his Bayesian Structural Time Series model (which I heard him speak about in Rome 12 months ago) seems interesting and very useful. I wonder when the promised R package will appear?

To leave a comment for the author, please follow the link and comment on their blog: Hyndsight » R. offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.