Site icon R-bloggers

How Not To Draw a Probability Distribution

[This article was first published on Isomorphismes, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

If I google for “probability distribution” I find the following extremely bad picture:

It’s bad because it conflates ideas and oversimplifies how variable probability distributions can generally be.

 

Here is a better picture to use in exposition. In R I defined

bimodal <- function(x) { 3 * dnorm(x, mean=0, sd=1)   +   dnorm(x, mean=3, sd=.3) / 4                          }.

That’s what you see here, plotted with plot( bimodal, -3, 5, lwd=3, col="#333333", yaxt="n" ).

Here’s how I calculated the mean, median, and mode:

Notice that I drew the numbers as vertical lines rather than points on the curve. And I eliminated the vertical axis labels. That’s because the mean, median, and mode are all x values and have nothing whatever to do with the vertical value. If I could have figured out how to draw a coloured dot at the bottom, I would have. You could also argue that I should have shown more humps or made the mean and median diverge even more.

Here’s how I drew the above:

png("some bimodal dist.png")
leg.text <- c("mean", "median", "mode")
leg.col <- c("red", "purple", "turquoise")
par(lwd=3, col="#333333")
plot( bimodal, -5, 5, main = "Some distribution", yaxt="n" )
abline(v = 0, col = "turquoise")
abline(v = .12, col = "purple")
abline(v = .75, col = "red")
legend(x = "topright", legend = leg.text, fill = leg.col, border="white", bty="n", cex = 2, text.col = "#666666")
dev.off() 

Lastly, it’s not that hard in the computer era to get an actual distribution drawn from facts. The nlme package has actually recorded heights of boys from Oxford:

require(nlme); data(Oxboys); plot( density( Oxboys$height), main = "height of boys from Oxford", yaxt="n", lwd=3, col="#333333")

and boom:

or in histogram form with ggplot, run require(ggplot2); qplot( data = Oxboys, x = height ) and get:

the heights look Gaussian-ish, without mistakenly giving students the impression that real-world data follows perfect bell-shaped patterns.

To leave a comment for the author, please follow the link and comment on their blog: Isomorphismes.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.