Is Your Data Science Credible Enough?

[This article was first published on RStudio Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Black Box

Does Your Data Science Lack Credibility?

In a recent post, we defined three key attributes of a concept we call Serious Data Science: Credibility, Agility and Durability. In this post, we’ll drill into the challenge of delivering credible insights to your stakeholders, and how to address that challenge.

Ultimately, organizations use data science to discover valuable insights and then apply those insights intelligently. Such applications might include making a better decision, improving a process, or otherwise changing how things are usually done. However, to make this happen, the organization must do at least two things:

  • Communicate these insights to the right decision-maker, stakeholder, or system (we’ll talk more about that in our next Serious Data Science post on being Agile).
  • Convince decision makers to trust the insight and accept its implications. If decision makers lack this trust, then they will likely ignore the recommendation, and fall back on “the way we’ve always done things.”

Typically, a host of unasked questions underlie a decision-maker’s seeming resistance to data-driven insights. They might not act on the conclusions of a data science team because they:

  • Don’t know the skills of the data scientist: Does the person who created this insight know what they are doing? Do they understand business risks as well as they understand their models?
  • Don’t trust data science tools: Did the data scientist depend too much on software in creating this result? Did the data science team just use black box tools that auto-magically produced an answer without an understanding of the business?
  • Don’t have confidence in the development process: Did the data scientist consider all reasonable approaches to the problem? Was there any way for someone else to review what was done, and know how things changed over time?
  • Don’t understand what the results mean: What is this insight actually telling me? How does it apply to what I do? What factors does it reflect? Is it really better than what we have done before? Could I get fired for acting on this result?

All these questions and doubts contribute to stakeholder hesitation, especially when they feel that they, not the data scientist, will ultimately be held responsible for the result. Fortunately, there are ways to overcome these obstacles.

How Can You Deliver Credible Insights?

To deliver insights that your decision makers and other stakeholders trust and actually use, we recommend adopting a Serious Data Science approach. To do this, your team must have the training and tools to find insights that are relevant and valuable. And, your team must communicate these insights to other stakeholders in your organization in a way that builds trust and understanding.

Here are the key elements which will help your team meet these challenges:

  • Widely-used open source software: The best way to make sure your team has the training to use a data science tool properly is to use the tools they already know. Millions of data scientists around the world learn data science using open source languages, such as R and Python. While some may argue which language is best (see this blog post for our take on that question), both have tremendous strengths and are trusted platforms.
  • Comprehensive data science capabilities: To be confident your team will find the best approach to any particular question, they need a wide range of analytic approaches readily available to apply and compare. Powered and validated by huge, ever-expanding communities and package libraries, the R and Python ecosystems ensure your team will always have the broadest range of tools for their analyses
  • Process transparency via code: Code allows others to inspect how a problem was first solved, and how that solution matured over time. Unlike point-and-click solutions where the history of how the analysis evolved is hidden beneath a pretty (inter)face or a spreadsheet where the logic is strewn across countless different cells, code explicitly describes what steps lead to the results. Further, code can be peer-reviewed and audited by third parties for further assurance of correct behavior.
  • Understanding through visualizations: Just as a picture is worth a thousand words, a great visualization can explain a thousand lines of code. Visualizations help stakeholders understand complex data science insights and build confidence in the results. Interactive tools such as Shiny allow data scientists to create visualizations that can improve the understanding of a data scientist’s work while spurring engagement from stakeholders.

Heather Nolis, Machine Learning Engineer at T-Mobile, and Jacqueline Nolis, Principal Data Scientist at Nolis, LLC, recently spoke at rstudio::conf 2020 about how they used Shiny to share their machine learning models drove engagement and built trust with their business stakeholders.

Serious Data Science: Credible, Agile, and Durable

These elements of Serious Data Science—trusted tools, comprehensive capabilities, flexibility, and transparency—will all help your team deliver insights that are more likely to be accepted by decision makers and actually have an impact. Next week, we will focus on Agility, and how your team can not only develop apps quickly but also regularly share those results with stakeholders to create a consensus, so you can make sure you are Getting to the Right Question.

Serious Data Science is:

Credible Agile Durable
  • Uses widely deployed and trusted tools
  • Includes comprehensive data science capabilities
  • Offers flexibility through the use of code
  • Provides transparency through visualizations and code
  • Employs existing knowledge and analytic investments
  • Allows rapid development and iteration
  • Scales well for enterprise and production use
  • Empowers your business stakeholders
  • Provides reusable, reproducible code and results
  • Delivers relevant, up-to-date insights
  • Supports and is supported by a vital open source community
  • Avoids vendor lock-in

Figure 1: Being credible is one of the crucial elements of a Serious Data Science platform.

Learn More about Serious Data Science

If you’d like to learn more about Serious Data Science, we recommend the following in addition to our previous posts in this series:

  • In a recent customer spotlight, Jared Goulart, Director – Operations Analytics at Redfin, described how a serious data science approach helped his team engage with stakeholders, allowing them to quickly evaluate different scenarios and plan their budgets for the next year.
  • R & Python: A Love Story shows how RStudio helps make the full breadth and power of R and Python available to data science teams and helps them make an impact on their organizations.
  • The Shiny Gallery highlights some of the amazing interactive visualizations that Shiny developers have created with R to convey insights and help their stakeholders make better, more informed decisions.

To leave a comment for the author, please follow the link and comment on their blog: RStudio Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)