One of the favorite bar room discussions of statisticians, machine learners, and computer scientists is – what is data science? (And I don’t care whether it happens in a bar or not, it’s a “bar room” discussion by virtue of it being a stupid, undecidable, and incredibly fun argument)
Since Google gave us the right to vote by linking (PageRank = Thomas Jefferson of the Internet?) let me cast my vote for a paper Jeremy Howard recently reminded me of:
The paper is over 10 years old, but that’s why I like it. William Cleveland has the gist right, and the paper isn’t recent enough to provoke argument about the finer points.
After all, “data science” is a stupid name because it’s a lot more like “data engineering” and we’re all really “data analysts” but, gosh … “data science” is just such a sexy name.
The only guys I know who really strike me as data scientists are Cosma Shalizi and Andrew Gelman and if you ask them what they are, they’ll squarely tell you that they’re statisticians and “data science” is a hyped-up term coined by Silicon Valley salesmen who want to re-brand statistics to scare up VC funding. (Potentially followed by more scolding)
Okay … I don’t know Andrew … and I didn’t ask Cosma directly … but I think that’s what they’d say.
As a burned out physicist with a psych degree, what am I? A data scientist of course! I looooove a good fad.