7 Must-Have Skills to Get a Job as a Data Scientist

[This article was first published on r – Appsilon | End­ to­ End Data Science Solutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

7 Must-Have Skills to Get a Job as a Data Scientist Thumbnail

Must-Have Skills for Data Science

Everybody and their mother wants to learn data science. And there’s no reason not to – the job you do is interesting 95% of the time, the salaries are excellent, and most likely you can get the job done from the comfort of your bed. 

You think you have what it takes? Apply for one or twelve open positions at Appsilon.

Today we’ll go over seven essential skills every data scientist should have. Here’s the complete list:

Creativity and Critical Thinking

Tasks in a day-to-day data science job are often vaguely defined – at least at the beginning of the project. More often than not, to provide any benefit from a data science solution, the data scientist must have a lot of domain knowledge. 

For example, how could you possibly develop credit risk models if you don’t know anything about the subject? Sure, you could do your best and follow well-established data science principles, but that can only get you so far. As a result, your models won’t work optimally, and you won’t know what to do about it.

That’s where creativity and critical thinking come into play. Data scientists have to distill a lot of information in a short time frame. Having a team of highly creative people might expose solutions that no one thought of before.

Critical thinking will help you dig deeper and always ask the right questions, and spot potential biases in the responses. 

Math and Stats

How much math you’ll use daily depends on your role. These four areas come up most often when looking at data science prerequisites:

Image 1 - Photo by Dhirendra Mirsa on Medium - https://medium.com/@dhirendra.misra/is-mathematics-core-of-machine-learning-1c6a75cb684c

Image 1 – Photo by Dhirendra Mirsa on Medium – https://medium.com/@dhirendra.misra/is-mathematics-core-of-machine-learning-1c6a75cb684c

It’s definitely not something you can pick up in a week, as every listed subject falls into a category of college-level math. 

It doesn’t mean you should spend the next year or so learning these subjects in depth, but you should know the basics. If you’re after a junior-level position, basic intuition and understanding of the applicability in data science should do. If you’re after a lead researcher position, it’s expected these topics are second nature to you.

There’s at least a several year gap between junior and senior data science positions, so you’ll always have the time to learn and explore further. The best part is – you can learn everything entirely for free! Here’s a complete reference for beginners:

Programming

All of the math, stats, and critical thinking skills in the world won’t help if you don’t know how to express your knowledge through code. Let’s take a look at the most widely used languages in data science:

Image 2 - Programming languages used by data professionals - 2019 Kaggle ML and Data Science Survey

Image 2 – Programming languages used by data professionals – 2019 Kaggle ML and Data Science Survey

In a nutshell – Python and R are industry leaders. SQL is supposedly used more than R, but that’s likely for another reason, covered later in this article.

If you’re entirely new to programming, there are some great news – both Python and R are easy to learn. On the other hand, if you’re coming from languages such as C or Java, these two shouldn’t be a problem to pick up. 

After all, Python was designed for teaching programming concepts to kids, so how complicated can it be for well-educated professionals?

As for R, here’s what you can do with it (assuming a basic knowledge of programming concepts):

Data Analysis and Visualization

To become an efficient data scientist, your data analysis and visualization skills have to be top-notch. Your results are here to tell a story, and nobody wants to read an incomplete and poorly presented one.

There’s a whole suite of data analysis and visualization packages available for both R and Python. R’s most popular analysis package is dplyr, and for Python, that’s pandas.

Want a complete beginner guide on data analysis with R? Check out our detailed guide to R’s dplyr.

When it comes to data visualization, a lot will argue that R takes a point here. The visualizations look better, especially with the default stylings. It’s most popular visualization library is ggplot2, and we have an entire series to get you started:

To conclude – proper analysis and visualization skills are a must. It’s not enough to know how to write code, but also to ask the right question. That’s why creativity and critical thinking are so important.

Machine Learning and Deep Learning

This is where all the hype is. Machine learning has been a trending topic over the last couple of years. It’s not that new of a concept – as it’s been introduced back in the 1950s – but the improvements in computing power made it accessible to almost anyone.

As a result, most companies included machine learning in their core service. This goes from something as basic as flower species classification to autonomous vehicles. 

The applications of machine learning are endless, so the learning path shouldn’t be the same for a business user and an aspiring computer vision engineer. Still, starting from the basics can’t hurt.

Here’s a couple of basic machine learning articles to get you started:

These two articles by no means capture the essence of “basic machine learning”. It’s a broad and fastly evolving field, so a single book or course won’t be enough to cover everything.

Databases

It’s likely you won’t work with CSV and Excel files most of the time. Instead, datasets will be stored in databases. There are many database vendors out there, such as Microsoft, IBM, and Oracle, and all of them have a single thing in common – SQL.

It’s a language for storing, extracting, and manipulating data within the database. SQL syntax varies a bit from database vendor to vendor, but the differences subtle, so it won’t take you much time to feel comfortable again if you decide to change a vendor.

You can go as simple or as complex as you want with SQL. The term “simple” indicates you’re just using it to drag the dataset to the memory (e.g., with Python or R), and “complex” indicates you’re doing most of the computations and aggregations in the database.

The second approach is a way to go if speed is critical, but it’s also a bad practice to pull the data you don’t need. 

Learning the basics of SQL shouldn’t take you much time. From a Python/R perspective, there are pre-made packages for establishing connections with any database, both on-premise and cloud-based. These packages are also well documented (usually), so establishing connections shouldn’t be an issue.

To recap – learn the basics of SQL so you can do the “heavy lifting” within the database, and pull only the data you need to Python/R.

Education

Less than 30% of data scientists have a bachelor’s degree or less, and around 20% are PhDs, according to the 2018 study made by Indeed. In a nutshell – a master’s degree is an expected common ground.

Here’s a complete overview per profession and education level:

Image 3 - Distribution of tech professionals by profession and education level (Indeed)

Image 3 – Distribution of tech professionals by profession and education level (Indeed)

This doesn’t mean you can’t get hired as a data scientist without a college degree, but only under two conditions:

  • The HR department doesn’t automatically filter you out for not having a college degree (read: apply for positions in small companies, they most likely don’t have an HR department)
  • You demonstrate more knowledge than, well, everyone who applied – as you’re the last one on the food chain

Yes, a degree is a useful thing for data science jobs, but a degree in what? Let’s take a look at the following chart:

Image 4 - Distribution of degree field studies by profession (Indeed)

Image 4 – Distribution of degree field studies by profession (Indeed)

As you can see, most data scientists have a background in either computer science, business, or math/stats. The number of data scientists with an official degree in data science is expected to rise, as there are more and more universities offering this specialization.

Conclusion

And there you have it – 7 essential skills you should have for a job in data science. The take-home point is: knowing the basics of each should be enough to get you an entry-level position. Only years of experience will help you climb the corporate ladder, and there’s always time to dive further into specific areas.

If you want to implement machine learning in your organization, you can always reach out to Appsilon for help.

Appsilon is hiring for remote roles! See our Careers page for all open positions, including R Shiny DevelopersFullstack EngineersFrontend Engineers, a Senior Infrastructure Engineer, and a Community Manager. Join Appsilon and work on groundbreaking projects with the world’s most influential Fortune 500 companies.

Article 7 Must-Have Skills to Get a Job as a Data Scientist comes from Appsilon | End­ to­ End Data Science Solutions.

To leave a comment for the author, please follow the link and comment on their blog: r – Appsilon | End­ to­ End Data Science Solutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)