## The Survey

The survey What Degree is Best for Data Science? ran from  February 9 through March 12, 2020 asking participants 4 questions:

• Q1: What is the highest level of school degree you have completed?
• Q2: Which of the following best describes the field in which you received your highest degree?
• Q3: What level of school degree you consider optimal for successful career in data science?
• Q4: Which field of study you consider optimal for successful career in data science?

During that period 289 respondents participated and 285 successfully completed all 4 questions, so 4 participants with partial answers were removed from analysis below.

Though simple and short (average time it took to complete survey was 55 seconds (after removing 6 outliers who took over 500 seconds to complete survey)) they survey possesses certain internal structure overlapping in time and subject. Time groups questions in 2 pairs: one about education already acquired by a participant and the other about participant recommendations for best education. Subject of questions yields alternative groups based on the answers questions share: pair of 1st and 3d about degree and pair of 2d and 4th about field of study.

## Sankey Diagrams: How Data Flows

Sankey diagrams help visualize how answers flow through the questions. We start with pairs of related questions and finish with all 4 questions together.

Completed Degree and Field of Study (Q1, Q2)

Best Degree and Field of Study (Q3, Q4)

Completed Degree vs. Best Degree (Q1, Q3)

Completed Field vs. Best Field (Q2, Q4)

Complete Flow of Answers For All 4 Questions

The survey is still open so anyone who didn’t participate so do so and also let others know about it. If you haven’t noticed yet there is certain bias towards statistics in answers. This might be because significant part of respondents reached the survey via R-bloggers distribution which is popular among R users who often have background in statistics. Finally, people with degree in Math are likely to suggest Math as best field, so on for other fields and degrees – this sort of bias is easy to see from Sankey diagrams above. Removing such bias from the results could be useful and I attempted this exercise but found it to be either too naive in my DIY approach or too extensive to process in short period of time from resources discovered. If you have pointers or even better a method of removing such bias from answers I’d love to hear from you.