Lying with Statistics: One Beer a Day will Kill you!

[This article was first published on R-Bloggers – Learning Machines, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

About two years ago the renowned medical journal “The Lancet” came out with the rather sensational conclusion that there is no safe level of alcohol consumption, so every little hurts! For example, drinking a bottle of beer per day (half a litre) would increase your risk of developing a serious health problem within one year by a whopping 7%! When I read that I had to calm my nerves by having a drink!

Ok, kidding aside: in this post, you will learn how to lie with statistics by deviously mixing up relative and absolute changes in risks, so read on!

The meta-study “Risk thresholds for alcohol consumption” adheres to the highest scientific standards, that is not the problem. The problem is how they chose to communicate the associated changes in risks for consuming alcohol.

For example, they tell you that by drinking a bottle of beer a day (half a litre) your risk of developing a serious health problem (like cardiovascular disease, cancer, cirrhosis of the liver, inflammation of the pancreas or diabetes) within one year would increase by 7%, i.e. 63 people on top of 914 people who would get a serious health problem anyway:

63 / 914 * 100 # shock horror: nearly 7% more with health problems when drinking half a litre of beer per day!
## [1] 6.892779

So, what does that mean? That about one in fourteen beer drinkers are going to bite the dust (no pun intended) next year? Fortunately not!

The problem is that this is a relative change in risk! It does not really help to assess the real danger. Only absolute changes in risk can do that!

To illustrate we use the personograph package (on CRAN) to show you what is really going on. Taking 2000 people about 18 would develop a serious health issue within one year anyway…

## Loading required package: grImport
## Loading required package: grid
## Loading required package: XML

n <- 2000
probl_wo_alc <- 18 / n

data <- list(first = probl_wo_alc, second = 1-probl_wo_alc)
personograph(data,  colors = list(first = "black", second = "#efefef"),
             fig.title = "18 of 2000 people with health problems",
             draw.legend = FALSE, n.icons = n, dimensions = c(20, 100), 
             plot.width = 0.97)
## Warning in, as.graphicsAnnot(x$label), x$x,
## x$y, : font family not found in Windows font database

…by consuming about 20 grams of alcohol per day (i.e. about 25 mL) a little more than one person would become seriously sick on top of that:

probl_w_alc <- 1 / n

data_2 <- list(first = probl_wo_alc, second = probl_w_alc, third = 1-(probl_wo_alc+probl_w_alc))
personograph(data_2, colors = list(first = "black", second = "red", third = "#efefef"),
             fig.title = "About 1 additional case with half a litre of beer per day",
             draw.legend = FALSE, n.icons = n, dimensions = c(20, 100),
             plot.width = 0.97)

As you can see, this doesn’t look spectacular at all! Yet, this would have been a good way to communicate the results so that everybody could get a feeling for what they really mean (but as I said, this doesn’t look spectacular at all, go figure!).

Doing the numbers also gives an absolute change in risk by only 0.063%! It is about 50% more probable to die in a house fire (and how many people do you personally know who actually died in a house fire? I don’t know anybody…)!

63 / 100000 * 100 # only 0.063% in absolute numbers!
## [1] 0.063

Please note: I do not say that it is safe to drink alcohol! But you have to put the numbers in perspective and the risk doesn’t seem to be overly high (to put it mildly) when you drink responsibly!

You see that you can use statistics not only to “lie” but to clarify things and communicate facts transparently. So, the problem lies not so much in statistics but in dishonesty and manipulation per se, which is the idea of one of my favorite cartoons (found here: CrossValidated):

To leave a comment for the author, please follow the link and comment on their blog: R-Bloggers – Learning Machines. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)