It continues to amaze us at the progress that our students are making in applying data science in the real world. Bryan Clark (LinkedIn), Data Scientist with H&M and student in our Business Science University DS4B 201R course has successfully applied the BSPF Framework (our data science projectmanagement methodology applied in the DS4B 201R course) to model the cost and risk of his forthcoming wedding (congratulations on getting married!!). In this article, Bryan presents his amazing analysis that led to the development of a minimum viable product: A Wedding Invitation Risk Modeling Application using R
and Shiny
. Way to go Bryan!
Summary

Objective: Bryan has a big wedding coming up (yay!), and he and his soontobe wife are interested in using a statistical model to determine how many invitations to send and to quantify the risk of going over budget.

Process: Bryan used the BSPF Framework. The BSPF is a repeatable data science project management framework designed to connect data science with the business.

Data Product (Web Application): Bryan used the
R
code developed in this tutorial to build a Wedding Invitation Risk Modeling Application usingR
Shiny
.
Wedding Risk Web Application (Demo Here)
Related Courses

Data Science for Business Course, DS4B 201R: Learn how to solve churn problems (big financial impact to organizations) using the BSPF Framework,
H2O
Automatic Machine Learning, andLIME
BlackBox Explainability. Bryan learned the BSPF Framework in the 201 course, and has since applied it to projects at H&M and his Wedding Modeling project. Go Bryan! 
Predictive Web Applications for Business with R Shiny Course, DS4B 102R: Learn to build predictive web applications (with integrated machine learning). Build 2 web applicactions with
parnsip
andXGBoost
machine learning to generate a demand forecast and product price modeling.
Code
All code in this stepbystep tutorial can be found in Bryan’s GitHub repo here.
Introduction
Background
“We are all a little weird and life’s a little weird, and when we find someone whose weirdness is compatible with ours, we join up with them and fall in mutual weirdness and call it love.” – Dr. Suess
I heard this quote recently while attending the wedding of my very good friend. In fact, I will be having my own wedding in less than a year and just went through the process of selecting a venue. Part of the process includes fixed and variable costs that depend on the number of guests that ultimately RSVP to attend the wedding.
My first question of this process was wondering how many people ultimately respond to wedding invitations. In other words, of the people we invite, how many can we expect to attend?
As I processed this information, I wondered if there was a better way to quantify the uncertainty of how many people I can expect to attend, and then turn that into an estimate of what my budget will be. Additionally, that estimate can be extended into a risk of going over budget.
Objective and Key Result

Objective: Determine a better way to quantify the risk of going over budget

Key Result: Develop a model that consumes guest cost inputs and then outputs the risk profile and a recommendation for moving forward.
I will look to combine elements of statistical simulation, risk analytics, and design thinking to build an analytics product that will extend a single example usecase into a flexible product that others can use.
Business Science Problem Framework
We will leverage the Business Science Problem Framework to shape the structure of our analysis and product development. The goal here is to understand the problem, explore potential opportunities, and operationalize the outcomes.
Business Science Problem Framework (BSPF)
Libraries & Theme Setup
These are the libraries used for the analysis.
Understand
1. View Business as Machine
1.1 Isolate the Business Unit
The “business unit” here is the coupe and their guests. While the costs associated with each guest is based on the wedding vendors, the costs of the wedding are impacted based on the number of guests that are invited and the guests that RSVP yes by the final guestcount cutoff. Inviting too many guests could lead the costs to increase lastminute due to an unexpected amount saying yes, which then creates a risk of going over budget to the couple.
Example Business Case
For the sake of the business case, let’s assume that 150 people
are on the initial guest invitation list. when taking all the wedding vendors into account, there is a fixed costs of $22,000
for 50 guests
, and then there is a variable cost of $125 per guest
above 50. This will be charged at a cutoff point 45 days prior to the date of the wedding, so at this point, the total cost will be known. The couple’s budget is $30,000
and they have a risk tolerance of 25%. In other words, they can stomach a 20%
chance of going over budget and ideally do not want to exceed $32,000
at the very most.
1.2 Define Objectives
The main objective is to quantify the risk of going over a couple faces based on how many people they send invitations to.
1.3 Collect Outcomes
In doing initial research, the main outcomes of going over budget is underestimating how quickly the costs can climb.
2. Understand the Drivers
2.1 Investigate Objectives
The objective at the time invites are sent out are to make sure that all the right people are invited, but at the same time ensure that unexpected costs do not pop up because too many people RSVP yes.
There are a few drivers that lead to the uncertainty of guest attendance. These factors are the distance to the wedding, cost of of attendance (e.g. hotel & travel), availability of the guest based on the time of year, and the strength of the relationship between the guest and the couple.
2.2 Synthesize Outcomes
However, there is no historical data to analyze in this instance as the wedding has never happened before. Therefore, it is unknown if the outcomes align with the objective. Other weddings see a variable amount of response rates (guests with a RSVP of yes). This leads to increased uncertainty in the estimates.
2.3 Hypothesize Drivers
The variability of estimates is likely due to the variability in the factors mentioned above. Additionally, even with historical data to use as a guide, there is natural deviation from an expected response rate as the response rate is the expected longrun average. The result we see for the actual event is only a single experiment.
3. Measure the Drivers
3.1 Collect Data
Some initial research states 1020% of invited guests will not attend while another source shows 6075% of invited guests will share in the day. In other words, guest attendance rates could be anywhere from 6090% based on a variety of factors.
We can use this data to help solve the problem analytically.
3.2 Develop KPIs
To develop a baseline, we have to figure out how to quantify our risk based on a hypothesis and statistics. The KPIs we focus on will be likelihood of risk, the expected total cost, and the expected value of risk.
We can use the data from our research to calculate baseline KPIs analytically.
To do so, we will make an assumption that there is a universal probability of each guest RVSPing yes. This then turns the concept of guests saying yes or no into a Bernoulli process. With the law of large numbers, we can then estimate our baseline KPIs based on the expected total number of guests.
total_guests  total_cost  total_risk  recommendation 

113  29875  125  Invite All 
Using this approach, we see the couple should plan for 113 guests to respond and will be under budget by $125. Using this method, the couple should Invite All.
Explore
4. Uncover Problems and Opportunities
4.1 Evaluate Baseline Performance
While the baseline method is favorable for the couple, it fails to account for the couple’s wedding being only a single experiment or trial of the Bernoulli process. The total cost may sometimes be lower, sometimes be higher, or sometimes the same as our analytical calculations.
4.2 Highlight Potential Problem Areas
The biggest opportunity will be to use simulation to replicate the experiment thousands of times and then analyze the frequency of outcomes. We also can add flexibility to the process by allowing a distribution of probabilities to be sampled for each replication. In other words, we can account for the 6090% uncertainty range. Another option would be to have different categories of guests and assign each group a different probability.
4.3 Review Process
To summarize, we will attempt to model a Bernoulli process to generate data to simulate wedding guest invites. We will use a uniform distribution to sample guest probabilities provided from the research. This assumes that the expert guess for probability is accurate enough, so the biggest opportunity for improvement lies in using real data to improve the inputs.
This method is a simplified version of the realworld process, but should provide added value over the analytically calculated alternative.
5. Encode Algorithms
5.1 Develop Algorithms
We need a few additional functions to piece together our data generating process.
We need a function to sample a guest count based on n
invitations with p
probability to respond.
We also need a function to simulate k
weddings and return outputs for total guests, total cost and risk results. These outputs are creating using our functions from above and their respective inputs. This function will also accept an argument for n
and a range for p
.
5.2 Quantify Financial Value Potential
Our simulation will return to us the results of k
trials, which we can use to analyze the outcomes. Each of these trials will capture whether the guest count caused the wedding to go over budget as well as the specific amount over budget it went.
5.3 Improve DecisionMaking via Recommendation Algorithm
Each trial also converts the outcome into a recommendation. If the outcome was under budget, the recommendation is to Invite All
, otherwise it is Invite Less
. For the final recommendation, we will summarize the outcomes and, if the proportion of outcomes over budget is less than the risk tolerance, we will recommend inviting all guests.
Operationalize
6. Measure Results
6.1 Capture Outcomes
We then load up our simulator with the inputs from above. Only this time, we have 90 invites for intown guests (with an estimated 90% chance of responding yes) and 60 invites for outoftown guests (with an estimated 50% chance of responding yes).
trial  total_guests  total_cost  risk  over_budget  recommendation 

1  128  31750  1650  Yes  Invite Less 
2  116  30250  150  Yes  Invite Less 
3  87  26625  3475  No  Invite All 
4  140  33250  3150  Yes  Invite Less 
5  109  29375  725  No  Invite All 
6  103  28625  1475  No  Invite All 
7  98  28000  2100  No  Invite All 
8  127  31625  1525  Yes  Invite Less 
9  91  27125  2975  No  Invite All 
10  137  32875  2775  Yes  Invite Less 
6.2 Synthesize Results
Ultimately, we are concerned with if we have invited too many people. Applying our summary function, we see that we should invite less people.
It is worth noting that the weighted average of our response probability is a little lower (74%) than the analytical method (75%), but that is part of the flexibility of the solution design. So even with the slightly lower probability of response, we see that the risk threshold is too great to move forward with this many invites.
6.3 Visualize Outcomes
To better understand the outcomes of the simulation, we will visualize four plots (these will go into the final web application):
 Total Guest Count
 Total Cost
 Total Risk
 Recommendation Outcomes
Here is the code to generate the 4 plots in the Wedding Invitation Application.
6.3.1 Plot Guest Count
6.3.2 Plot Cost
6.3.3 Plot Risk
6.3.4 Plot Recommendation
7. Report Financial Impact
7.1 Measure Actual Results
Based on the outcome of our analysis, we see the couple needs to find a way to trim the invite list. Had we moved forward with the initial analytical solution, the couple would run a greater risk of exceeding their budget than they indicated they would be comfortable with.
The model can be rerun with a smaller invitation list and new results reported. This aspect of the model makes it a great candidate to be created into an analytics product. A tool like Shiny enables all the code to be embedded into a dashboard.
7.2 Quantify Financial Benefit
While the simulation shows the guest count coming in under budget more times than not (with savings potential over $3000), we do see the couple goes over budget over 20% of the time (their threshold). Had they moved forward with the counts as is, they faced a 95% chance to lose potentially over $2000.
Each of these leads them to trimming down the guest list and potentially saving additional money.
7.3 Report Financial Benefit to Stakeholders
Building a PDF report to generate for the couple would be beneficial as they could reference the report as they decide on how many people to trim from the list. The couple could also recheck their financial calculations and decide to move forward with the counts as is.
Whether in RMarkdown or Shiny, this would be a nice valueadd feature.
Analytics Product
The working minimum viable product created from this analysis can be found here.
Wedding Risk Web Application (Demo Here)
Acknowledgements
I’d like to acknowledge the follow list of people/organizations for helping to influence this project:
 My soontobe wife for being my muse and soundboard
 Boston University MSc. Applied Business Analytics: AD 616 Enterprise Risk Analytics
 Business Science & Matt Dancho
 Analytics Lifecycle Toolkit
Related Reading
Now let’s chat in the comments: What did you think of Bryan’s analysis, Shiny App, and use of the BSPF Framework?
Rbloggers.com offers daily email updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...