Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

by John Mount Ph. D.
Data Scientist at Win-Vector LLC

Win-Vector's last article on A/B testing described the scope of the realistic circumstances of A/B testing in practice and gave links to different standard solutions. In this article we will be take an idealized specific situation allowing us to show a particularly beautiful solution to one very special type of A/B test. For this article we are assigning two different advertising message to our potential customers. The first message, called “A”, we have been using a long time, and we have a very good estimate at what rate it generates sales (we are going to assume all sales are for exactly \$1, so all we are trying to estimate rates or probabilities). We have a new proposed advertising message, called “B”, and we wish to know does B convert traffic to sales at a higher rate than A? We are assuming:

• We know exact rate of A events.
• We know exactly how long we are going to be in this business (how many potential customers we will ever attempt to message, or the total number of events we will ever process).
• The goal is to maximize expected revenue over the lifetime of the project.

As we wrote in our previous article: in practice you usually do not know the answers to the above questions. There is always uncertainty in the value of the A-group, you never know how long you are going to run the business (in terms of events or in terms of time, and you would also want to time-discount any far future revenue), and often you value things other than revenue (valuing knowing if B is greater than A, or even maximizing risk adjusted returns instead of gross returns). This represents severe idealization of the A/B testing problem, one that will let us solve the problem exactly using fairly simple R code. The solution comes from the theory of binomial option pricing (which is in turn related to Pascal's triangle).

Yang Hui (ca. 1238–1298) (Pascal's) triangle, as depicted by the Chinese using rod numerals.

For this “statistics as it should be” article let us work the problem (using R) pretending things are this simple. For our solution please click here.

Win-Vector blog: A Dynamic Programming Solution to A/B Test Design