Learning Data Science: A/B Testing in Under One Minute

[This article was first published on R-Bloggers – Learning Machines, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.


Google does it! Facebook does it! Amazon does it for sure!

Especially in the areas of web design and online advertising, everybody is talking about A/B testing. If you quickly want to understand what it is and how you can do it with R, read on!

The basic idea of A/B testing is to systematically (and normally automatically) test two different alternatives, e.g. two different web designs, and decide which one does better, e.g. in terms of conversion rate (i.e. how many people click on a button or buy a product):

source: wikimedia

The bad news is, that you have to understand a little bit about statistical hypothesis testing, the good news is that if you read the following post, you have everything you need (plus, as an added bonus R has all the tools you need already at hand!): From Coin Tosses to p-Hacking: Make Statistics Significant Again! (ok, reading it would make it over one minute…).

To give you a practical example we will use a dataset from DataCamp’s course on “A/B Testing in R” (experiment_data.csv), which shows whether each group (control and test group) either clicked on the respective offer… or not (clicked_adopt_today):

experiment <- read.csv("data/experiment_data.csv")
experiment <- experiment[ , 2:3]
head(experiment, 10)
##    condition clicked_adopt_today
## 1    control                   0
## 2    control                   1
## 3    control                   0
## 4    control                   0
## 5       test                   0
## 6       test                   0
## 7       test                   1
## 8       test                   0
## 9       test                   0
## 10      test                   1

Let us create two tables with the absolute and the relative proportions:

prop <- table(experiment)
prop_abs <- addmargins(prop)
prop_abs
##          clicked_adopt_today
## condition   0   1 Sum
##   control 245  49 294
##   test    181 113 294
##   Sum     426 162 588

prop_rel <- prop.table(prop, 1)
prop_rel <- round(addmargins(prop_rel, 2), 2)
prop_rel
##          clicked_adopt_today
## condition    0    1  Sum
##   control 0.83 0.17 1.00
##   test    0.62 0.38 1.00

Now for the actual test: conveniently enough, R has the prop.test function, which tests whether two proportions are significantly different (by performing a so-called Pearson’s chi-squared test under the hood). We only have to put our original table into the function and R does the rest for us:

prop.test(prop)
## 
##  2-sample test for equality of proportions with continuity
##  correction
## 
## data:  prop
## X-squared = 33.817, df = 1, p-value = 6.055e-09
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  0.1442390 0.2911352
## sample estimates:
##    prop 1    prop 2 
## 0.8333333 0.6156463

Voilà, that was it already! Because the p-value is way below the common threshold of 0.05 the difference is highly significant, so we can reject the null hypothesis (that the difference is just due to chance)!

As a consequence, we would definitely go for the design that was presented to the test group in the future.

To leave a comment for the author, please follow the link and comment on their blog: R-Bloggers – Learning Machines.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)