By Richard Pugh, Commercial Director
I love my job. Seriously. I was enjoying it before Hal Varian made it sexy, but since then, and the data science explosion, everything has kicked into an even higher gear. Why do I love it? Because fundamentally, our job as data scientists is to help people make better decisions based on the information we have at hand.
As Mango Solutions continues to grow faster that you can say “what do you mean we need to look at offices again”, I find myself talking to more and more graduates about the skills needs to be a good data scientist, and pitfalls to avoid (mostly because I’ve stumbled into every pitfall at some point, so just about know where they are .. well, the ones I know about so far, of course).
When someone suggested writing this blog post, and gave me a title starting with “the single most important …” my initial reaction was to run away quickly. Because surely, stating “the single most important …” in front of anything leaves you open to a Monty-Python-Spanish-Inquisition back down at some point along the line. But in the end I agreed … so here goes …
In my opinion the single most important skill for a data scientist is not:
- Knowing the difference between a GLM and a GAM
- Understanding which R package is best to use for a particular task
- Being able to extract data from twitter and merge it with your relational database
- Creating a really smart plot that simultaneously communicates a message clearly and looks really sexy
No, in my opinion, the single most important skill for a data scientist is … Empathy.
Why “empathy”? Because if we’re going to drive decisions with analytics, we need to appreciate the number of different personalities involved, what they are trying to achieve, what constraints they work under etc.
For example, a data scientist may end up interacting with:
- The business user, who just wants to make more informed decisions, possibly in a very short time frame.
- The IT contact, who has possibly never heard of the funky analytic technology you’re about to mention, and has to fill in 100 forms just to get a new server commissioned.
- The marketing person, who wants to make sure you know that the colour of your graph needs to be #333380, not #3D3D99!
- The internal statistician, who perhaps doesn’t understand this funky gradient boosted regression trees approach of which you speak, but is going to end up supporting this analytic solution.
Being able to interact with these people and take their aims and concerns into account when you’re designing analytic solutions is essential to make sure you create something fit for purpose in a positive way.
Even when you’re not interacting with the team above, empathy is still something that should be at the front of your mind as a data scientist. For example:
- When I’m writing some code to extract data, is this a “one off” thing, or had I better write it in a more generic style, parameterise column names etc?
- Who is going to support the code I’m writing? Maybe I should steer clear of that “holy crap how clever am I” short line of code that does a million things and replace it with a few well-documented lines of simpler code?
- How do I best present the insight back to the user? In a visual style perhaps? Then let’s make sure they can clearly see the message past the funky interactive embedded scatter/ring/pie(!) graph I’m making
- Once they’ve understood the message, what will my business users’ next question be? Maybe I should anticipate that and make it easy to answer that question too?
- Having fit a cool model, does the end user really want to see a p-value? Or do they just want to know what decision to make?
So, that’s it. In my opinion, the single most important skill for a data scientist is “Empathy”.
… and Fear! The two most important things are Empathy and Fear …
… and Surprise! Empathy and Fear and Surprise …
… and … I’ll come in again.