Disclaimer: This was originally written on my Medium blog here, so the formatting is a little different from my usual style.
If you just got started or have been working a while in a data role, the jargon thrown around can sometimes get overwhelming with all the things to need to learn or be on top of. After spending some time thinking about it, I have managed to come up with 5 tips for landing a “data professional” role.
As martial arts skeptic Matt Thorton puts it when discussing the comparison of various martial arts:
“Just as there is no such thing as “Canadian geometry”, but there is geometry.”
While he wrote this in context of martial arts, I think the same is true with working with data. Business domains come and go, but many of the problems stay the same when working with data.
I am far from being a master in the fields of Data Science, Data Engineering and Machine Learning. However I have found the following useful in one setting or another in my experience.
With that in mind, my five tips are:
1. Learn to Pivot Data from Long to Wide Form (and vice versa)
This is something which comes up in analyzing or preparing data in one way or another. This is a skill which is language agnostic also. The more settings you know how to do it, the more versatile you will be in contributing to an analysis or data engineering project.
If you want to learn how to do this in R or Python, I have a blog on my personal site showing how to pivot data from long to wide from in R and Python and comparing the two.
2. Learn to Manipulate Character Strings.
Being someone who specializes in data means being someone who is able to turn a mess into insights. Develop your regular expressions (regex) skills, learn to extract data from strings and blurbs of text.
Extracting data from strings is something that I have encountered regularly and I have done it in multiple roles. Its always gonna be needed. Get good at it.
There are a variety of ways which you can learn about regular expressions, but if you want something more structured I would recommend going through Hackerrank’s regex challenges.
Learning regular expressions might initially seem overwhelming with having to memorize everything, but I can tell you that you don’t need to memorize it all! Quite often I look at a string I want to extract data from and copy-paste it (or some of it) in regex101.com and figure out the regular expression I need through what I remember and some trial and error. The more you do it, the easier it gets!
3. Learn to Write API Requests and Handle the Response
This isn’t particularly hard, but oftentimes, data is not handed out on a silver platter for you to analyze or load. Mastering the art of data acquisition and transformation is required for all data professionals regardless of where they are in terms of workflow.
Get in the practice of reading API documentation. Create a request. See what the response is. 200- you’re in business. 502? 404? Not so much- figure out what the problem is and if there is a solution.
Its not incredibly difficult to learn how to write API requests and its not something which is language specific. Knowing how read API documentation and writing the appropriate requests is something I deal with regularly. I’m sure I’m not the only one out there. Learn how to do it.
For more info on understanding API requests DataCamp has a good article on the topic using the
requests module in Python.
But what do you do if there is no accessible API available to get your data?
4. Learn to Scrape Data
Data rarely comes readily available. Often times you will need to figure out other ways to get it. For some reason web scraping is not spoken about enough. Its likely because there is moral hazard attached to it.
Provided that there is no TOS violations, web scraping enables collection of data to be optimized. Just be decent and not slam the servers with rapid speed requests. Put some system sleeps in between to have your code behave as a regular user as well.
Check out the selenium module. With this video it only takes around 10 minutes to learn and its a skill that will be worthwhile in a variety of settings.
With all of this in mind, lets get to the final point:
5. Remember: Machine Learning is useless without being able to control the data.
While having knowledge of machine learning is a powerful asset. If you are unable to manipulate and transform the data appropriately, the algorithms that you have are useless because they only accept data when shaped in a specific form.
If your company is looking to employ AI but does not have a data strategy, its likely that one needs to be set up before looking into using AI and machine learning.
The above are are some tips that are that I think are handy to be mindful of when being presented with recurring questions and problems relating to data as a data professional.
I hope you found this blog useful. Be sure to share it around if you liked it and let me know if you didn’t!