In this post I showcase a nice bar-plot and a balloon-plot listing recommended Nutritional supplements , according to how much evidence exists for thier benefits, scroll down to see it(and click here for the data behind it)
* * * *
The gorgeous blog “Information Is Beautiful” recently publish an eye candy post showing a “balloon race” image (see a static version of the image here) illustrating how much evidence exists for the benefits of various Nutritional supplements (such as: green tea, vitamins, herbs, pills and so on) . The higher the bubble in the Y axis
score (e.g: the bubble size) for the supplement the greater the evidence there is for its effectiveness (But only for the conditions listed along side the supplement).
There are two reasons this should be of interest to us:
- This shows a fun plot, that R currently doesn’t know how to do (at least I wasn’t able to find an implementation for it). So if anyone thinks of an easy way for making one – please let me know.
- The data for the graph is openly (and freely) provided to all of us on this Google Doc.
The advantage of having the data on a google doc means that we can see when the data will be updated. But more then that, it means we can easily extract the data into R and have our way with it (Thanks to David Smith’s post on the subject)
For example, I was wondering what are ALL of the top recommended Nutritional supplements, an answer that is not trivial to get from the plot that was in the original post.
In this post I will supply two plots that present the data: A barplot (that in retrospect didn’t prove to be good enough) and a balloon-plot for a table (that seems to me to be much better).
The R code to produce the barplot of Nutritional supplements efficacy score (by evidence for its effectiveness on the listed condition).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
# loading the data supplements.data.0 <- read.csv("http://spreadsheets.google.com/pub?key=0Aqe2P9sYhZ2ndFRKaU1FaWVvOEJiV2NwZ0JHck12X1E&output=csv") supplements.data <- supplements.data.0[supplements.data.0[,2] >2,] # let's only look at "good" supplements supplements.data <- supplements.data[!is.na(supplements.data[,2]),] # and we don't want any missing data supplement.score <- supplements.data[, 2] ss <- order(supplement.score, decreasing = F) # sort our data supplement.score <- supplement.score[ss] supplement.name <- supplements.data[ss, 1] supplement.benefits <- supplements.data[ss, 4] supplement.score.col <- factor(as.character(supplement.score)) levels(supplement.score.col) <- c("red", "orange", "blue", "dark green") supplement.score.col <- as.character(supplement.score.col) # mar: c(bottom, left, top, right) The default is c(5, 4, 4, 2) + 0.1. par(mar = c(5,9,4,13)) # taking care of the plot margins bar.y <- barplot(supplement.score, names.arg= supplement.name, las = 1, horiz = T, col = supplement.score.col, xlim = c(0,6.2), main = c("Nutritional supplements efficacy score","(by evidence for its effectiveness on the listed condition)", "(2010)")) axis(4, labels = supplement.benefits, at = bar.y, las = 1) # Add right axis abline(h = bar.y, col = supplement.score.col , lty = 2) # add some lines so to easily follow each bar
Also, the nice things is that if the guys at Information Is Beautiful will update there data, we could easily run the code and see the updated list of recommended supplements.
So after some web surfing I came around an implementation of a balloon plot in R (Thanks to R graph gallery)
There where two problems with using the command out of the box. The first one was that the colors where non informative (easily fixed), the second one was that the X labels where overlapping one another. Since there is no “las” parameter in the function, I just opened the function up, found where this was plotted and changed it manually (a bit messy, but that’s what you have to do sometimes…)
Here are the result (you can click the image for a larger image):
And here is The R code to produce the Balloon plot of Nutritional supplements efficacy score (by evidence for its effectiveness on the listed condition).
(it’s just the copy of the function with a tiny bit of editing in line 146, and then using it)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
require(colorspace) require(gplots) # I was able to find the function by using # methods(balloonplot) # This command: getAnywhere("balloonplot.default") # Wouldn't work... balloonplot2 <- gplots:::balloonplot.default # This one works 🙂 # now run: fix(balloonplot2) # search for # y <- ny + 0.75 + (nlabels.x - i + 0.5) * colmar # And add beneath it the following line: # y <- rep(y, dim(xlabs)) - c(0,.5,1) supplement.benefits <- tolower(supplement.benefits ) supplement.name <- tolower(supplement.name) balloonplot2( supplement.name,supplement.benefits, supplement.score, xlab ="supplement", ylab="Benefit", show.margins=F, dotsize = 15,fun=function(x)max(x,na.rm=T), rowmar = 7, colmar = 7, dotcolor = rev(heat_hcl(max( supplement.score)))[ supplement.score-1], main = c("Balloon plot of", "Nutritional supplements efficacy score","(by evidence for its effectiveness on the listed condition)", "(2010)"), sub = c("Published on www.r-statistics.com") )
Got any good ideas of how else to plot the data? let me know in the comments