Using science to find the best decaf

Posted on March 22, 2026 by Giles Dickenson-Jones in R bloggers | 0 Comments

[This article was first published on Data Analytics and AI Archives - Giles, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

TLDR: To test whether I could tell the difference between decaf coffees I conducted a highly scientific test (subject to funding constraints).

One of my goals for 2025 was reducing my caffeine intake after having one too many sleepless nights. The problem was that all the decaffeinated coffee I’d tried was terrible.

Or was it?

After all, I once called myself an audiophile until a series of A/B tests suggested I couldn’t tell the difference between tracks encoded at different bitrates. So, it was entirely possible that I’d been brainwashed by big coffee to believe decaf coffee was inferior.

But, how exactly could we test this out?

The most obvious solution was to run a cross-country randomized double-blind experiment. This way, I wouldn’t automatically base my resentment on the caffeinated status of the coffee and focus on my subjective rating of the quality of each coffee.

Which is pretty much what I did:

Step 1: Sample selection

The first step was to select a wide enough sample of coffee beans to make the study as sciency as possible. Roping in my wife to help out, I purchased as many decaf varieties I could get our hands on.

Step 2: Sample blinding

After selecting a large representative sample of coffees (n=6), I packed a sample of each in its own container (pictured). To obscure each coffee’s identity I assigned them a number from 1 to 6. To further enhance the science I then had my wife assign new numbers so neither of us knew the origin of each sample.

Step 3: Testing

Before starting the test I cleaned and descaled the coffee machine. Beans from each container were freshly ground at room temperature and used to make six separate espressos. Shots were drawn on a quasi-random basis according to whatever my wife handed to me. We then took a sip of each coffee and ranked our preferences from one to six.

Step 4: Results

Although I’d have liked to pre-register my research, none of the top econometric journals I contacted expressed interest. However, my running assumption was that our preferences for a coffee were mainly psychological and had little to do with its caffeine content.

If this was true, I’d expect to see no relationship between our rankings. But, to my surprisethis didn’t appear to be the case. Instead, we both ranked the beans in a similar order:

Code Snippet:

#load libraries and import data
library(tidyverse)
dta_coffee_science<-read_csv("./Data/250216 blind coffee ratings.csv")

# Show linear association between samples by assigned label
#reverse axis so lower rankings are higher on the axis scale
plt_rankings_by_coffee_no <- ggplot(data = dta_coffee_science,
                                  aes(y = ranking_person_b, x = ranking_person_a)) +
 geom_text(aes(label = blind_label_round_2), size = 3.5) +
 scale_y_reverse(name = "Person A Ranking (1 = Best)") +
 scale_x_reverse(name = "Person B Ranking (1 = Best)") + 
 labs(title = "Coffee Sample Rankings: Person A vs Person B",
      subtitle = "Double-blind taste test results of brewed coffee samples",
      caption = "Note: Lower numbers indicate higher preference") +
 theme_classic()


plt_rankings_by_coffee_no

Of course, we’re doing some real science here, so to check let’s apply Kendall’s Tau of ranks and the Spearman correlation test for a null-hypothesis that there is no statistical association between our rankings.

Code Snippet:

#kendall
cor.test(data=dta_coffee_science, 
         ~ ranking_person_a + ranking_person_b, method = "kendall")

#spearman
cor.test(data=dta_coffee_science, 
         ~ ranking_person_a + ranking_person_b, method = "spearman")

With p-values from six to eight percent, this isn’t a ringing endorsement of the results, but having already written the blog I’m happy to adjust my definition of significant to conclude our preferences were similar to one another.

Of course, my willingness to play fast and lose with the stats also stems from knowing a key result: we both ranked the store-bought caffeinated beans highest.

Code Snippet:

# Show linear association between samples by assigned label and caffeination status 
plt_rankings_by_caffeine <-ggplot(data=dta_coffee_science,
                     aes(y=ranking_person_b, x=ranking_person_a,col=decaf))+
 geom_text(aes(label = blind_label_round_2), size = 3.5) +
  scale_y_reverse(limits = c(7.5, 0.5)) +   
  scale_x_reverse(limits = c(7.5, 0.5)) +  
    coord_cartesian(clip = "off") +  
 labs(title = "Coffee Sample Rankings: Person A vs Person B",
      subtitle = "Double-blind taste test results of brewed coffee samples",
      caption = "Note: Lower numbers indicate higher preference",
      x="Person A Ranking (1 = Best)", 
      y="Person B Ranking (1 = Best)") +
 theme_classic()+
 scale_color_manual(values = c("black", "blue"), name = "Decaf:") 

plt_rankings_by_caffeine

I also found it surprising that the beans from a specialized provider of decaf weren’t necessarily ranked higher, with only one of their beans ranked in the top three:

Code Snippet:

# Show linear association between samples by assigned label and caffeination status with original labels
plt_rankings_by_caffeine_named <- ggplot(data = dta_coffee_science,
                     aes(y = ranking_person_b, x = ranking_person_a, col = decaf)) +
  geom_text(aes(label = str_wrap(paste0(coffee_brand, ": ", coffee_name), width = 15)), 
            size = 3.5, lineheight = 0.8) +
  scale_y_reverse(limits = c(7.5, 0.5)) +   
  scale_x_reverse(limits = c(7.5, 0.5)) +   
  coord_cartesian(clip = "off") +            # stop clipping text at panel border
  labs(title = "Coffee Sample Rankings: Person A vs Person B",
       subtitle = "Double-blind taste test results of brewed coffee samples",
       caption = "Note: Lower numbers indicate higher preference",
       x = "Person A Ranking (1 = Best)", 
       y = "Person B Ranking (1 = Best)") +
  theme_classic() +
  theme(plot.margin = margin(10, 60, 10, 60)) +  
  scale_color_manual(values = c("black", "blue"), name = "Decaffeinated")

plt_rankings_by_caffeine_named

And while these coffee nerds might disagree, the results suggest we can tell the difference between coffees and both prefer the caffeinated alternative.

When I recounted the result to a food chemist they told me this probably has something to do with decaf coffee lacking the bitterness of caffeine.

When I recounted the results to my wife, she told me to never waste her time like this again. I probably will.

In the spirit of open science, you can download the dataset here.

The post Using science to find the best decaf appeared first on Giles.

To leave a comment for the author, please follow the link and comment on their blog: Data Analytics and AI Archives - Giles.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Using science to find the best decaf

Step 1: Sample selection

Step 2: Sample blinding

Step 3: Testing

Step 4: Results

Related

Step 1: Sample selection

Step 2: Sample blinding

Step 3: Testing

Step 4: Results

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)