Am I a data scientist?

[This article was first published on Hyndsight » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Last night I gave a very short talk (less than 5 minutes) at the Melbourne Analytics Charity Christmas Gala, a combined event of the Statistical Society of Australia, Data Science Melbourne, Big Data Analytics and Melbourne Users of R Network.

This is (roughly) what I said.


Statisticians seem to go through regular periods of existential crisis as they worry about other groups of people who do data analysis. A common theme is: all these other people (usually computer scientists) are doing our job! Don’t they know that statisticians are the best people to do data analysis? How dare they take over our discipline!

I take a completely different view. I think our discipline is in the best position it has ever been in. The demand for data analysis skills is greater than ever. Our graduates are highly sought after, and well paid. Being a statistician has even been described as a sexy profession (which presumably is a good thing to be!).

The different perspectives are all about inclusiveness. If we treat statistics as a narrow discipline, fitting models to data, and studying the properties of those models, then statistics is in trouble. But if we treat what we do as a broad discipline involving data analysis and understanding uncertainty, then the future is incredibly bright.

Here are two quotes from well-known bloggers in the last year or two:

April 2013: Larry Wasserman blog
Data science: the end of statistics?
If you’re analyzing data, you’re doing statistics. You can call it data science or informatics or analytics or whatever, but it’s still statistics.

November 2013: Andrew Gelman blog
Statistics is the least important part of data science
There’s so much that goes on with data that is about computing, not statistics. I do think it would be fair to consider statistics as a subset of data science …

Statistics is important—don’t get me wrong—statistics helps us correct biases … estimate causal effects … regularize so that we’re not overwhelmed by noise … fit models … visualize data … I love statistics! But it’s not the most important part of data science, or even close.

How can two professors of statistics have such different views on their discipline? The same perspectives can be seen in the following two diagrams (both reproduced with permission).

Data_Science_VD

Source: Drew Conway, Sept 2010. Reproduced under a Creative Commons Licence.

Venn-Diagram-of-Data-Scientist-Skills-

In the first narrow view, to be a data scientist you have to know a great deal about statistics, mathematics, computer science, programming, and the application discipline. If that’s true, I’ve never met a data scientist. I don’t believe they exist.

In the second broader view, everyone here is a data scientist, although we have different specializations and different perspectives and training.

I take the broad inclusive view. I am a data scientist because I do data analysis, and I do research on the methodology of data analysis. The way I would express it is that I’m a data scientist with a statistical perspective and training. Other data scientists will have different perspectives and different training.

We are comfortable with having medical specialists, and we will go to a GP, endocrinologist, physiotherapist, etc., when we have medical problems. We also need to take a team perspective on data science.

None of us can realistically cover the whole field, and so we specialise on certain problems and techniques. It is crazy to think that a doctor must know everything, and it is just as crazy to think a data scientist should be an expert in statistics, mathematics, computing, programming, the application discipline, etc. Instead, we need teams of data scientists with different skills, with each being aware of the boundary of their expertise, and who to call in for help when required.

Let’s not be too sectarian about our disciplines, thinking everyone not trained in the same way we were is a heretic.

It reminds me of a famous joke, written by comedian Emo Philips:

I was walking across a bridge one day, and I saw a man standing on the edge, about to jump off. I immediately ran over and said “Stop! Don’t do it!“
“Why shouldn’t I?” he said.
I said, “Well, there’s so much to live for!“
“Like what?“
“Well … are you religious or atheist?“
“Religious.“
“Me too! Are you Christian or Jewish?“
“Christian.“
“Me too! Are you Catholic or Protestant?“
“Protestant.“
“Me too! What francise?“
“Baptist.“
“Wow! Me too! Northern Baptist or Southern Baptist?“
“Northern Baptist“
“Me too! Are you Northern Conservative Baptist or Northern Liberal Baptist?“
“Northern Conservative Baptist“
“Me too! Are you Northern Conservative Fundamentalist Baptist or Northern Conservative Reformed Baptist?“
“Northern Conservative Fundamentalist Baptist“
To which I said, “Die, heretic scum!” and pushed him off.

To leave a comment for the author, please follow the link and comment on their blog: Hyndsight » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)