When it was floated that I should write this article, I approached it with trepidation. There is no better way to start an argument in the world of data than by trying to define what a Data Scientist is or isn’t – by adding in the complication of the relatively newly appearing role of Data engineer, there is no way this is not going to end in supposition and a lot of sentences starting “I reckon”.
Nevertheless, understanding the value of these two vital roles is important to any organisation that seeks to unlock the value found in its data – without being able to describe these roles, it’s next to impossible to recruit appropriately. With that in mind, here is what I reckon.
By thinking in terms of rigidly defined boxes we are missing the point. A Data team should be thought of as covering a spectrum of the range of skills you need for effective data management and analytics. Simple boxes like Data Scientist and Data Engineer are useful, but should not be too rigidly defined.
Reams has been written attempting to define what a Data Scientist is. The data science community has careered from expecting a Data Scientist to know everything from Dev Ops to statistics, to a Data Scientist needing to have a PhD which leads to large institutions giving up and just rebranding their BI professionals as Data Scientists. All of this misses the point.
Then arise the Data Engineer. No longer is your IT department the custodians of the data. The role has become too specialist and critical to the business for those who have worked really hard to understand traditional IT systems and think things like 3rd Normal Form is something in gymnastics and Hadoop is a noise you make after eating a kebab. Completely understandably, data has outgrown your average IT professional, but what do you need to make sure your data is corralled properly? Can’t we just throw a Data Scientist at it and get them to look after the Data? Again, I think this misses the point.
Human beings are good at putting things into boxes and categories. It is how we manage the world and it is largely how we are trained to manage our businesses. Our management accountants take care of the finances and our HR department takes care of our employees. However, by putting people in these boxes with fairly rigid boundaries, there is a risk that necessary skills are missed and you end up with a team across your organisation that cannot provide what the business needs.
This is particularly true when we come to think of Data Scientists and Data Engineers. Rather than thinking of people in terms of the box to put them in, when looking at building your data team it is preferable to think of a spectrum of skills that you need to cover. These can be broadly put into the boxes of Data Scientist and Data Engineer, however the crossover can be high as can be seen in the diagram above.
In your Data Engineering team you will need individuals with a leaning towards the world of Dev Ops and you will need team members who are close to Machine Learning engineers. Likewise, in your Data Science team you will need members who are virtually statisticians, and team members who know something about deploying a model in a production environment – making sure your team as a whole and your individual project teams cover this skill mix, can be a real challenge.
So in summary, I reckon that we need to stop thinking about the boxes we put people in quite so much, and start looking at the skills we actually need in our teams to make our projects a success. Understanding the Data Scientist/Data Engineer job roles as a spectrum of skills where you may need Data Engineer-like Data Scientists, and Data Scientist-like Data Engineers, will give you more success when it comes to building your data team and delivering value from your data.