Site icon R-bloggers

New Course Content: DS4B 201 Chapter 7, The Expected Value Framework For Modeling Churn With H2O

[This article was first published on business-science.io - Articles, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I’m pleased to announce that we released brand new content for our flagship course, Data Science For Business (DS4B 201). The latest content is focused on transitioning from modeling Employee Churn with H2O and LIME to evaluating our binary classification model using Return-On-Investment (ROI), thus delivering business value. We do this through application of a special tool called the Expected Value Framework. Let’s learn about the new course content available now in DS4B 201, Chapter 7, which covers the Expected Value Framework for modeling churn with H2O!

Related Articles On Applying Data Science To Business

If you’re interested in learning data science for business and topics discussed in the article (the expected value framework and the Business Science Problem Framework (BSPF)), check out some of these articles.

Learning Trajectory

We’ll touch on the following topics in this article:

Alright, let’s get started!


Get The Best Resources In Data Science. Every Friday!

Sign up for our free “5 Topic Friday” Newsletter. Every week, I’ll send you the five coolest topics in data science for business that I’ve found that week. These could be new R packages, free books, or just some fun to end the week on.

Sign Up For Five-Topic-Friday!


Where We Came From (DS4B 201 Chapters 1-6)

Data Science For Business (DS4B 201) is the ultimate machine learning course for business. Over the course of 10 weeks, the student learns from and end-to-end data science project involving a major issue impacting organizations: Employee Churn. The student:

Chapter-By-Chapter Breakdown:

“The ultimate machine learning course for business!”

Chapter 1: Code Workflow and Custom Functions for Understanding the Size of the Problem

Chapter 5: Ultimate Performance Dashboard for Comparing H2O Models

OK, now that we understand where we’ve been, let’s take a sneak peek at the new content!

New Content (Chapter 7): Calculating The Expected ROI (Savings) Of A Policy Change

This is where the rubber meets the road with ROI-Driven Data Science! You’ll learn how to use the Expected Value Framework to calculate savings for two policy changes:

Here’s the YouTube Video of the Expected Value Framework for Delivering ROI.



Students implement two overtime reduction policies. The first is a “No Overtime Policy”, which results in a 13% savings versus the baseline (do nothing). The second is a “Targeted Overtime Reduction Policy”, which increased the savings to 16% versus the baseline (do nothing). The targeted policy is performed using the F1 score showing the performance boost over a “Do Nothing Policy” and the “No Overtime Policy”.

The targeted policy requires working with the expected rates. It’s an un-optimized strategy that treats the true positives and true negatives equally (uses the F1 score, which does not account for business costs of false negatives). This occurs at a threshold of 28%, which can be seen in the Expected Rates graph below.

Chapter 7: Working With Expected Rates

Calculating the Expected Value at the threshold that balances false negatives and false positives yields a 16% savings over a “Do Nothing Policy”. This targeted policy applies an overtime reduction policy to anyone with greater than a 28% class probability of quitting.

Chapter 7: Calculating Expected Savings Vs Baseline (Do Nothing)

We end Chapter 7 with a brief discussion on False Positives and False Negatives. The problem with using the threshold that maximizes F1 is that False Negatives are typically 3X to 5X more costly than False Positives. With a little extra work, we can do even better than a 16% savings, and that’s where Chapter 8 comes in.

Where We’re Going (Chapter 8): Threshold Optimization and Sensitivity Analysis

Chapter 8 picks up where Chapter 7 left off by focusing on using the purrr library to iteratively calculate savings. Two analyses are performed:

  1. Threshold Optimization Using Cost/Benefit and Expected Value Framework – Maximizes profit (savings)

  2. Sensitivity Analysis to adjust parameters that are “assumptions” to grid search best/worst case scenarios and to see there effect on expected savings.

The threshold optimization is the first step, which can be performed by iteratively calculating the expected savings at various thresholds using the purrr package.

Chapter 8: Threshold Optimization With `purrr`

Next, the student visualizes the threshold optimization results using ggplot2.

Chapter 8: Visualizing Optimization Results With `ggplot2`

Sensitivity analysis is the final step. The student goes through a similar process but this time use purrr partial(), cross_df(), and pmap_dbl() to calculate a range of potential values for inputs that are not completely known. For example, the percentage overtime worked in the future is unlikely to be the same as the current year. How does that affect the model? How does the future overtime interact with other assumptions like the future net revenue per employee? Find out how to handle this by taking the course. 🙂

Next Steps: Take The DS4B 201 Course!

If interested in learning more, definitely check out Data Science For Business (DS4B 201). In 10 weeks, the course covers all of the steps to solve the employee turnover problem with H2O in an integrated end-to-end data science project.

The students love it. Here’s a comment we just received last Sunday morning from one of our students, Siddhartha Choudhury, Data Architect at Accenture.

“To be honest, this course is the best example of an end to end project I have seen from business understanding to communication.”

Siddhartha Choudhury, Data Architect at Accenture

See for yourself why our students have rated Data Science For Business (DS4B 201) a 9.0 of 10.0 for Course Satisfaction!

Get Started Today!

Learning More

Check out our other articles on Data Science For Business!

Business Science University

Business Science University is a revolutionary new online platform that get’s you results fast.


Why learn from Business Science University? You could spend years trying to learn all of the skills required to confidently apply Data Science For Business (DS4B). Or you can take the first course in our integrated Virtual Workshop, Data Science For Business (DS4B 201). In 10 weeks, you’ll learn:

You can spend years learning this information or in 10 weeks (one chapter per week pace). Get started today!

Sign Up Now!

DS4B Virtual Workshop: Predicting Employee Attrition

Did you know that an organization that loses 200 high performing employees per year is essentially losing $15M/year in lost productivity? Many organizations don’t realize this because it’s an indirect cost. It goes unnoticed. What if you could use data science to predict and explain turnover in a way that managers could make better decisions and executives would see results? You will learn the tools to do so in our Virtual Workshop. Here’s an example of a Shiny app you will create.

Get Started Today!

Shiny App That Predicts Attrition and Recommends Management Strategies, Taught in HR 301

Our first Data Science For Business Virtual Workshop teaches you how to solve this employee attrition problem in four courses that are fully integrated:

The Virtual Workshop is code intensive (like these articles) but also teaches you fundamentals of data science consulting including CRISP-DM and the Business Science Problem Framework and many data science tools in an integrated fashion. The content bridges the gap between data science and the business, making you even more effective and improving your organization in the process.

Here’s what one of our students, Jason Aizkalns, Data Science Lead at Saint-Gobain had to say:

“In an increasingly crowded data science education space, Matt and the Business Science University team have found a way to differentiate their product offering in a compelling way. BSU offers a unique perspective and supplies you with the materials, knowledge, and frameworks to close the gap between just “doing data science” and providing/creating value for the business. Students will learn how to formulate their insights with a value-creation / ROI-first mindset which is critical to the success of any data science project/initiative in the “real world”. Not only do students work a business problem end-to-end, but the icing on the cake is “peer programming” with Matt, albeit virtually, who codes clean, leverages best practices + a good mix of packages, and talks you through the why behind his coding decisions – all of which lead to a solid foundation and better habit formation for the student.”

Jason Aizkalns, Data Science Lead at Saint-Gobain

Get Started Today!

Don’t Miss A Beat

Connect With Business Science

If you like our software (anomalize, tidyquant, tibbletime, timetk, and sweep), our courses, and our company, you can connect with us:

To leave a comment for the author, please follow the link and comment on their blog: business-science.io - Articles.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.