Two years in Data Science and not yet a Data Scientist

July 25, 2018
By

(This article was first published on R on R-house, and kindly contributed to R-bloggers)

What’s in a name?

Despite the potentially grumpy sounding title of this post, this is more a positive reflection of the past two years since I started working in Data Science. I think I’ve come a long way, but there is still so far to to go if I am to confidently call myself a Data Scientist. Why does a job title matter? It’s a good way of thinking about your competencies and describing where you want to go, and conveying that to other people.

I still class myself as a Data Analyst. If you were to do a quick Google search, you will find dozens of articles attempting to define Data Analyst, Data Engineer, Data Scientist, etc. and the struggles that companies may have in advertising roles with these titles, without really understanding what it is they need. To my knowledge there is as yet no universally agreed definitions for these roles, but I think generally speaking, practitioners mostly get it and definitions are probably converging.

To decribe what I mean by this term, I tend to think of things in terms of the following breakdown:

  • Descriptive Analytics, which use data aggregation and data mining to provide insight into the past and answer: “What has happened?”
  • Predictive Analytics, which use statistical models and forecasts techniques to understand the future and answer: “What could happen?”
  • Prescriptive Analytics, which use optimization and simulation algorithms to advice on possible outcomes and answer: “What should we do?”

I believe I’m confined right now to Descriptive Analytics, because I don’t exploit machine learning (yet), and hence call myself a Data Analyst rather than a Data Scientist. In fact, I don’t think I even cover Descriptive Analytics completely because of the range of unsupervised machine learning tricks I’ve yet to become competent in that can be used for data mining.

Ironically, the one area I think I have the most experience in (but don’t currently practice) is Prescriptive Analytics, by virtue of the nearly 15 years prior experience I’ve had in optimisation, simulation, and general Operational Research. It’s probably worth saying also, that I’ve picked up a whole host of other technical and soft skills during that time, which is why I’m quite upbeat about it. As I’m mid-career, I still have a lot of time to branch out into machine learning.

To R or not to R?

In previous posts I’ve mentioned DataCamp as a great learning resource for coding. I’ve made my way through most of the mammoth Data Scientist with R career track, but have taken an ‘operational pause’ leaving the two remaining machine learning courses for the future. For the past couple of years I’ve structured my Data Science learning around R, but there are so many other software packages out there that can be used for Data Science…so the question is, are they a distraction or can they help make me better?

For a while, I thought it was the former. I thought packages like Tableau would gradually erode my coding experience, ultimately taking me further away from machine learning. However, since trying it out I’ve become convinced that packages like these are crucial, even for the hardcore Data Scientist, as we’ll not always have the time to code up the prototype visualisations or dashboards we would like, especially in a sprint setting.

So what next? I want to get proficient with Tableau. I want to be able to generate professional looking and interactive visualisations in a fraction of the time it would take to code it up. I want to have that spare time to iterate and experiment. For now, that’s the only other package I think is worth learning as I continue with R. I’m also quite content spending the next couple of years getting more practice in sourcing, cleaning, summarising and visualising data in R, mainly through the tidyverse.

The other key technology I need to get much more acquainted with before moving onto machine learning is Git. It’s essential I start to move away from doing it in RStudio and onto the command line.

So that’s my plan.

Patience or Procrastination?

I have the advantage that a large part of my role at the moment is about championing data science techniques, and educating people who are less data literate than I in taking them up. The inertia of change means that buys me some time to consolidate where I’m currently at. However when the indicators suggest that these skills are beginning to proliferate more widely across the organisation, I do need to be in a position where I’m one step ahead.

There’s part of me that wonders whether I’m putting this off as I know it’s going to be extremely difficult…I know how expansive machine learning is, and I know how deeply you have to understand it in order to use it properly and responsibly. I’ve already done a lot of reading on the subject, and I’ve seen how machine learning in R is spread across multiple packages, rather than the one-stop shop of SciKit-Learn in Python. That kind of disorganisation and inconsistency fills me with dread. I really do hope there’s some kind of tidyverse-like effort to make things easier at some point soon, because I see it as a needless barrier at the moment.

Long may it continue

I’ve been more energised and enthused in these last two years as I have been in my entire career:

  • The tools we have at our disposal today allow us to create some glorious pieces of analysis;
  • The experience I’ve had of the team-working in Scrum has been invigorating and it’s so exciting being able to do so many sprints on different topics so frequently;
  • It’s been refreshing to see an entire organisation make a real effort to adopt these ways of working, even from the most senior leadership.

I can say for certain that this is the best career choice I’ve made, and reminds of of exactly what I was looking for when I first left university. You can’t ask for more than that.

To leave a comment for the author, please follow the link and comment on their blog: R on R-house.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)