The Higgs boson: 5-sigma and the concept of p-values

July 4, 2012

(This article was first published on R Psychologist » R, and kindly contributed to R-bloggers)

Today’s announcement at CERN of the latest research on the Higgs boson was truly extraordinary. Not only was the scientific achievement remarkable, but medias reporting of 5-sigma as a measure of “certainty” was also truly remarkable. For instance, the science editor at the Swedish news paper Dagens Nyheter reported that a sigma of 4.9 equals a certainty of 99.99994 %, which obviously isn’t true, simply because p( D | H0 ) is not the same as p( H0 | D ). In plain english this means that a p-value represents the conditional probability of getting the data given that the null hypothesis is true. Nothing more, and it surely doesn’t give the probability for the alternative hypothesis being true, i.e. the “certainty” that somethings been found that’s not a random fluctuation.

So what does physicists mean when they report 5-sigma? Well, it’s just another convention of reporting alpha values. Sigma refers to the population standard deviation, and 5-sigma means that they accept events as statistical significant if they fall more than 5 standard deviations away from the mean, given that the null hypothesis is true. And here the null hypothesis is that the event is simply due to random noise or fluctuations. You can get the p-values for 5-sigma by taking the area under the normal curve that’s to the left of +5 sigma.

> pnorm(5)
[1] 0.9999997

And then take 1 – 0.9999997 to get the p-value, which is 0.0000003 as the CERN researchers performed a one-tailed test. I imagine physicists say 5-sigma because saying “point zero zero zero zero zero zero three” might become quite tiresome, so it’s quite ironic that journalist all over the world seem to be converting sigma back to percent.

If we want we can also use R and ggplot2 to illustrate 5-sigma by plotting the normal distribution and superimpose a line at sigma 5

x <- seq(-6,6,length=200)   # sigmas
y <- dnorm(x)               # curve

df <- data.frame("sigma" = x,"y" = y) # create data frame

# plot
text_block <- "A confidence level = 5-sigma represents \nthe probability of getting a result from your \nexperiment, simply from random fluctuations \nalone, equal to the area under the curve \nthat’s to the right of the dotted line. That’s an \nexceptionally rare event. However, the area to \nthe left of 5-sigma does not represent the \nprobability or certainty that the Higgs boson \nhas been found."
ggplot(df, aes(sigma,y)) + 
  geom_line(size=1) +  
  annotate("text", x=1.7, y=0.2, label=text_block, size=4, hjust=0) +
  annotate("segment", x = 5, xend=5, y = 0, yend = 0.05, linetype="dashed") +
  annotate("text", x=5, y=0.05, label="5-sigma", vjust=-0.5)

Note: The actual plot below has been fine-tuned in Illustrator.
Higgs boson cern 5-sigma and p-values

The area under the curve that’s to the right of the dotted line represents the p-value for 5-sigma. We see that observations in that area are highly unlikely to occur if we assume that the null hypothesis is true.

To leave a comment for the author, please follow the link and comment on their blog: R Psychologist » R. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)