Data science = failure of imagination

January 8, 2013

(This article was first published on Econometric Sense, and kindly contributed to R-bloggers)

I think I like this distinction between Bayesian and Frequentist statistics: 

“we are nearly always ultimately curious about the Bayesian probability of the hypothesis (i.e. “how probable it is that things work a certain way, given what we see”) rather then in the frequentist pobability of the data (i.e. “how likely it is that we would see this if we repeated the experiment again and again and again”).”

But I think the rest of the article gives a mischaracterization of  data science, take for instance the following paragraph:

“But most importantly, data-driven science is less intellectually demanding then hypothesis-driven science. Data mining is sweet, anyone can do it. Plotting multivariate data, maps, “relationships” and colorful visualizations is hip and catchy, everybody can understand it. By contrary, thinking about theory can be pain and it requires a rare commodity: imagination.”

Actually, in my opinion it takes way more imagination to develop an effective data visualization than develop an estimator by hand and prove its unbiased or consistent. I’d much rather do the latter because I frankly don’t have the imagination to do the best job with the former. 

But, data science is much more than visualization. As far as not being intellectually demanding, trying to understand the back proposition algorithm used by neural networks not to mention actually coding your own algorithm isn’t child’s play.

As far as results, ultimately it is about getting the right tool for the right job. There are plenty of cases, in bioinformatics and genomics for example where the algorithmic approach is more useful than say ANOVA. As Leo Brieman said: 

“Approaching problems by looking for a data model imposes an apriori straight jacket that restricts the ability of statisticians to deal with a wide range of statistical problems.” 

Culture Shock? see “Culture War: Classical Statistics vs. Machine Learning” here

To leave a comment for the author, please follow the link and comment on their blog: Econometric Sense. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)