Driving Real, Lasting Value with Serious Data Science

[This article was first published on RStudio | Open source & professional software for data science teams on RStudio, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Data science is now a hot area of investment for many organizations. Countless blogs, articles, and analyst reports emphasize that effective data science is critical for competitive advantage, and many business leaders believe that data science is vital for an organization to survive, much less thrive, over the next several years.

However, many data science leaders grapple with an existential crisis for their teams. On the one hand, many vendors and analyst reports emphasize the rise of Citizen Data Scientists, empowered by tools that promise to augment and automate the hard work of data science to automagically answer vital questions, no Data Scientist required. On the other hand, machine learning and deep learning methods in the hands of software engineers, fueled by lots of computational power, answer more and more questions (as long as the problem is well-defined, and there is sufficient data available). Squeezed in between these trends, what is the role of a data scientist?

Even worse, nearly as many blogs and analyst reports emphasize the challenges of effectively implementing data science in an organization, and emphatically state that most analytics and data science projects fail, and most companies don’t achieve the revenue and profit growth that they hoped their data science investments would deliver.

We will dive into the role of a data scientist in more detail in the coming weeks, but here we will focus on this question: Why is getting real, lasting value from data science investments so difficult?

Many data science projects lack credibility and impact over time

In talking to many different organizations implementing data science projects, we have seen many challenges that prevent data science investments from delivering the value they should. These typically fall into three categories:

  • Lack of credibility: Data science leaders grapple with whether their team has the necessary training and the right tools to discover relevant and valuable insights in their data. Once the team has found something interesting, how can others in the organization understand and trust those insights enough to actually change their behavior, and make decisions based on them? This problem is compounded if the approach is a difficult-to-explain, black box model.

  • Slow path to value: Seemingly simple questions like “Which customers will be our most profitable next quarter?” often turn into month-long research projects as data scientists scour the firm for data and struggle to wrangle it into shape (a topic we discussed in a recent blog post, Wrangling Unruly Data). Then once the data scientists start to develop an analysis, they find iterating and refining their results with stakeholders slow and unwieldy (something we covered in another blog post, Getting to the Right Question). These slow response times frustrate business sponsors and often stymie putting data insights into action. Worse, they encourage decision makers to go with their gut intuition instead of data.

  • Ephemeral benefits: Once a valuable insight or tool has reached a decision maker, organizations struggle with maintaining and growing the value of these data science investments over time. They find it difficult to implement repeatable and reproducible processes as their systems and data science tools evolve, which often forces them to start from scratch when solving a new problem, or to reimplement old analyses when needed. Furthermore, data science practice at an organization often become dependent on a single software vendor, and that vendor may try to extract more of the value the customer receives as software license revenue.

Andrew Mangano, Data Intelligence Lead at Salesforce, spoke at rstudio::conf 2020 about the importance of delivering useful insights to your stakeholders.

Real-world problems need serious data science

So what’s the answer? And how do you cut through all the hype and confusion?

The reality is that hard, vaguely defined but valuable to solve, problems exist in the world. Commodity approaches (whether via augmented analytics for citizen data scientists, or standard machine learning approaches for software engineers) yield commodity answers. Real-world business problems require smart, agile data science teams empowered with the flexibility and breadth of open source languages like R and Python. We know this because tens of thousands of you use our software every day to do amazing things.

To deliver real, lasting value, organizations need to set aside the hype and build on a strong foundation. We recommend adopting a strategy we call Serious Data Science. As shown in Figure 1, Serious Data Science is an approach to data science designed to deliver insights that are:

  • Credible: The first step is to ensure that your team has the training and tools to find insights that are relevant and valuable, and that your team can communicate these insights to other stakeholders in your organization in a way that builds trust and understanding.
  • Agile: Next, the platform you use must enable data scientists to quickly develop and iterate those valuable insights, and get them to your decision makers, where they can have an impact.
  • Durable: Finally, to deliver lasting value, the platform must also make it easy to reuse and reproduce your team’s data science work, to deliver up-to-date insights, and do so in a sustainable way for the long term.

Serious Data Science is….

Credible Agile Durable
  • Uses widely deployed and trusted tools
  • Includes comprehensive data science capabilities
  • Offers flexibility through the use of code
  • Provides transparency through visualizations and code
  • Employs existing knowledge and analytic investments
  • Allows rapid development and iteration
  • Scales well for enterprise and production use
  • Empowers your business stakeholders
  • Provides reusable, reproducible code and results
  • Delivers relevant, up-to-date insights
  • Supports and is supported by a vital open source community
  • Avoids vendor lock-in

Figure 1: Crucial elements of a Serious Data Science platform.

Why you should adopt Serious Data Science

We’ll be writing in detail about these components of Serious Data Science in the weeks to come. But before we get to that, we must address a topic near and dear to every data science leader: the role of the data scientist within the organization. Our post next Tuesday will address how that role is changing in today’s organizations, and why they will need the Serious Data Science framework to continue demonstrating their value in the months and years to come.

Learn more about Serious Data Science

If you’d like to learn more about Serious Data Science, we recommend:

To leave a comment for the author, please follow the link and comment on their blog: RStudio | Open source & professional software for data science teams on RStudio.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)