EMC last week published the results of a survey of 462 IT decision makers who self-identified as either a data scientist or business intelligence professional (plus 35 invitees who were attendees at the EMC Data Scientist Summity and/or Kaggle competitors). There's a nice summary of the conclusions at the EMC blog, (where data scientists are described as “The New Rock Star”) and you can also find writeups at eWeek and ITBusinessEdge. Here are a few of my takeaways from the report and how they pertain to the R language:
The world needs more data scientists, stat*! According to the survey, 65% of data science professionals believe demand for data science talent will outpace the supply over the next 5 years. What's more, most think that new data scientists will be found from graduating classes. R is the de-facto standard for statistics teaching at universities (and with many academic institutions no longer able afford SAS or SPSS licensing, more are adopting free statistical software for teaching and research), and with more than 2 million users worldwide may of these new data scientists will be already be trained in R. In our experience with Revolution Analytics customers, this is a key factor in the growing adoption of R in corporations.
Data Science and Business Intelligence aren't the same thing. One of the most interesting aspects of the survey for me was how it highlighted the differences between data science and business intelligence, given that the survey participants identified themselves as one or the other. This is especially revealed in the choices of data analysis tools by BI professionals (dark blue) and data scientists (light blue) in the chart below taken from the EMC report:
That 20% of data scientists use R but only 5% of self-described business intelligence professional do so isn't much of a surprise, and illustrates the key difference between BI and Data Science. (BTW, I'm surprised Excel wasn't an option for Data Analysis as well as Data Management — I'd expect to see similar levels of usage amongst BI professional for that use case.) While data science is about exploring and learning from data, BI is a process with limited flexibility to answer a fairly narrow range of questions. But as businesses start reaping the benefits of data scientists to extract answers to more complex questions from big data, there's no doubt that there will be a need to get these models, predictions, and visualizations in the hands of a BI audience that wouldn't normally use a tool like R. That's why being able to integrate R into BI frameworks and other end-user applications is so important.
* Pun very much intended.
EMC Press Release: New Global Study: Only One-Third of Companies Making Effective Use of Data