**free range statistics - R**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Possibly you have seen this graphic circulating on social media. At first it doesn’t look too remarkable, but then you notice the vertical axis. Oh. The gridlines are equally spaced on the page, but sometimes the same space represents 30 people, sometimes 10, and sometimes 50. It isn’t even strictly increasing – which might be expected if someone was trying to crudely copy by hand the effect of a logarithmic scale – but seems to be completely arbitrary.

It’s so bad it’s funny. This is clearly incompetence not malevolence. But it’s a serious degree of incompetence.

I actually find this image sad, more so the more I look at it. Like many people I first found it amusing, then I just thought “how did they even *do* that?” I really don’t know, but most likely seems to be that in effect it was made (or heavily edited) by hand in graphic design software. Then I just felt sorry for whomever created it – out of their depth, under time pressure, or whatever their problems are.

Of course, a flip version of that question, “how would you do this if you *wanted* to?”, is an interesting one. I took it on and decided it’s worth writing up as an illustration of the elegant power of the `scales`

package (by Hadley Wickham and Dana Seidel), one of the important complements of `ggplot2`

.

First, let’s draw a conventional version of the chart with the usual approach to axis scales:

This was created with the code below, most of which is about theming the chart to resemble the original look and feel. We’ll be able to use this plot, stored as `p`

, later.

In order to transform the vertical axis to the same, twisting and turning transformation implicitly used in Fox News’ version, we’ll create a new “transformation” object with the `scales`

function `trans_new`

. Each transformation in the `scales`

paradigm goes two ways:

- convert the the original value of y to a transformed version representing the position on the page (in this case, we need to convert any value of y from 30 to 90 to
`y / 30`

so it will align with positions 1 to 3, between 90 and 100 to a number between 3 and 4 so`3 + (y - 90) / 10`

, and so on). We’ll use a bunch of simple linear transformations to do this. - convert the transformed values back to the originals. This is necessary for the labelling of the axis – this is why the label for 10,000 in
`scale_y_log10()`

is “10,000” or “1e4”, not “4”. So we need to tell our plotting functions that “3” on the transformed scale means 90 on the original, “5” means 130, and so on.

Here’s the function `crazyfox_trans()`

which stores both the original transform and its inverse:

… and here’s the result when we add this scale to our original plot, using the `trans =`

argument of `scale_y_continuous`

.

Nice! With the axis breaks set at 100, 200 and 300 we can see how the unusual transformation used by Fox has compressed the scale between 300 and 400 more than other levels. It’s actually possible they *were* trying to mimic a log transformed scale and went wrong in back-transforming the labels. If so, it’s even sadder – like seeing a chimpanzee banging a drum and thinking that makes it part of the brass band.

Now that we’ve transformed the scale and plotted the lines and points appropriately, there’s one detail to add to bring this plot to Fox-level perfection. This is to set the gridlines at locations that are equally distant vertically from eachother.

There we go. This noteworthy chart, re-created reproducibly with the correct transformations needed to make that scale appropriate. The benefit of this approach is that I now have the `crazyfox_trans()`

function available for future use, if I need this particular weird stop-start stepwise transformation for future data.

The more serious point here is the elegance of the transform / inverse structure of `scales`

’ transformations. It’s something we take for granted (for example, in getting those labels right for a logarithmic scale) and will often seem basic, but as seen here gives enormous flexibility for creating new transformations easily and efficiently.

**leave a comment**for the author, please follow the link and comment on their blog:

**free range statistics - R**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.