Creating Colorblind-Friendly Figures

[This article was first published on Brian Connelly » R | Brian Connelly, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Color is often used to display an extra dimension in plots of scientific data. Unfortunately, everyone does not decode color in exactly the same way. This is especially true for those with color vision deficiency, which affects up to 8 percent of the population in its 2 most common forms. As a result, it has been estimated that the odds of a given plot reaching a reviewer with some form of color vision deficiency in a group of three males is approximately 22%. Hopefully, when we are creating figures, this number alone is compelling enough to always keep these viewers in mind. The truth, however, is that your figures aren‘t only seen by reviewers: they are seen by a much wider group that includes readers of your paper, members of the audience when you present your work, viewers of your lab‘s website, and potentially many others. As your audience grows, your choices in color become more and more important for effectively communicating your work.

Although there are many outstanding tools for creating beautiful plots, practically all of them have default color palettes that can present decoding challenges for individuals with color vision deficiencies. This is an introduction to creating plots and figures using color palettes that are more accessible. For the examples below, I use the excellent ggplot2 library for R. The same ideas and colors can easily be transferred to your particular tool of choice.

Using Color to Represent Categorical Data

When using color to encode categorical data, such as blood type, gender, or strain of a bacteria, it is important to choose a color palette that has as many easily-differentiable colors as there are categories. The figure below shows one palette that can encode up to 8 values, and simulates how each of its colors is seen by someone with protanopia, deuteranopia, and tritanopia.

With ggplot2, the color palette for categorical data can be set using scale_color_manual (for points, lines, and outlines) and scale_fill_manual (for boxes, bars, and ribbons). The argument to either of these commands is a vector of colors, which can be defined by hex RGB triplet or by name. As an example, let’s take a look at the relationship between the weight and the corresponding price of diamonds in ggplot2′s included diamonds data set. We can use color to indicate the quality of the cut. Note that this data set is quite large, so this scatter plot might not be the most informative way to display these data.

ggplot(diamonds, aes(x=carat, y=price, color=cut)) +
    geom_point()
Plot of weight versus price using ggplot2's default color palette

Plot of weight, price, and cut using ggplot2′s default color palette

scale_color_manual sets the color of the first category (chosen alphabetically in R unless an ordering is specified) using the first color given, the second category with the second color, and so on. Using the colors from the colorblind-safe palette shown above:

ggplot(diamonds, aes(x=carat, y=price, color=cut)) +
    geom_point() +
    scale_color_manual(values=c("#000000", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7"))
Plot of diamond price as a function of weight using the colorblind-friendly palette

Plot of diamond price as a function of weight using the colorblind-friendly palette

Otherwise, if you don’t want to have to remember the ordering of your categories, or if you want to apply specific colors to each category, you can manually define the color of each:

ggplot(diamonds, aes(x=carat, y=price, color=cut)) +
    geom_point() +
    scale_color_manual(values=c("Fair"="#E69F00", "Good"="#56B4E9", "Premium"="#009E73", "Ideal"="#F0E442", "Very Good"="#0072B2"))
plot3

Plot of diamond price as a function of weight using the colorblind-friendly palette and assigning colors based on category.

Redundant Encodings

When describing a figure, it is a common tendency to refer to a specific color. Hopefully, you’re at least now convinced that not everyone sees color the same way, especially when using a standard red, green, blue color palette. It is also very common for figures to be printed in black and white or your printer to be low on magenta ink. To improve legibility when your figures aren’t reproduced exactly as created, consider using redundant encodings. As an example, we can use both shapes and colors to refer to categories:

ggplot(diamonds, aes(x=carat, y=price, color=cut, shape=cut)) +
    geom_point() +
    scale_color_manual(values=c("#000000", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7"))
plot4

Cut quality displayed using both color and point shape

The use of redundant encoding can also aid in figure captions, where referring to a category as “the blue squares” is helpful both for those with color vision deficiencies, and for those with printer troubles (all of us?). However, if the data can be represented with symbols equally as well as with colors, this does beg the question that should always be asked: Are colors are absolutely necessary?

Using ColorBrewer Palettes

No discussion on color palettes would be complete without mentioning Cynthia Brewer‘s ColorBrewer, an excellent source for color palettes that includes both colorblind-safe and print-friendly palettes.

colorbrewerMany graphics packages allow you to easily make use of the ColorBrewer palettes. In ggplot2, this is done with the scale_color_brewer command.

ggplot(diamonds, aes(x=carat, y=price, color=cut)) +
    geom_point() +
    scale_color_brewer(palette="Dark2")
Plot of diamond price as a function of weight using the ColorBrewer's Dark2 palette and assigning colors based on category.

Plot of diamond price as a function of weight using the ColorBrewer’s Dark2 palette and assigning colors based on category.

Using Color to Represent Continuous Values

When using color to represent continuous values, special care should be taken to ensure not only that colors chosen are differentiable, but also that viewers interpret changes in value of a given magnitude similarly throughout the spectrum. The rainbow color map, which is the default in many graphics packages, does not do this well. Color palettes that use variations, not only in hue, but also in saturation and lightness, can produce more linear changes in perception.

Greyscale and rainbow color palettes. While the greyscale palette shows smooth and even changes, the rainbow palette has areas of differing contrast, such as near the cyan and yellow regions. Image from Subtleties of Color Part 2.

Greyscale and rainbow color palettes. While the greyscale palette shows smooth and even changes, the rainbow palette has areas of differing contrast, such as near the cyan and yellow regions. Image from Subtleties of Color Part 2.

Of course, color gradients can introduce additional problems for viewers with color vision deficiencies when certain areas of the spectrum are included. For these viewers, colors that vary uniformly in lightness, which is how the greyscale palette is made, are most accessible. Again, always ask yourself if the use of color conveys information that could be encoded in another way.

ggplot2 includes a number of functions for making continuous color scales such as scale_color_gradient, scale_color_continuous, and scale_color_grey. To demonstrate, I’ll switch to the mtcars data set, which contains, among other things, fuel economy for 32 cars manufactured in 1973-1974.

# Example borrowed from the geom_tile documentation
ggplot(mtcars, aes(y=factor(cyl), x=mpg)) +
    stat_density(aes(fill=..density..), geom="tile", position="identity")
Distribution of fuel economies as related to engine size among a sampling of cars

Distribution of fuel economies as related to engine size among a sampling of cars

Fortunately, ggplot2 does a nice job in displaying continuous values with color by default. Otherwise, we can use the RColorBrewer package to fetch palettes from ColorBrewer (the “PuBuGn” palette in this case), and apply them using the scale_color_gradientn command:

ggplot(mtcars, aes(y=factor(cyl), x=mpg)) +
    stat_density(aes(fill=..density..), geom="tile", position="identity") +    
    scale_fill_gradientn(colours=brewer.pal(n=8, name="PuBuGn"))
Distribution of fuel economies as related to engine size among a sampling of cars. In this version, we use the PuBuGn color palette from ColorBrewer.

Distribution of fuel economies as related to engine size among a sampling of cars. In this version, we use the PuBuGn color palette from ColorBrewer.

Further Reading

To leave a comment for the author, please follow the link and comment on their blog: Brian Connelly » R | Brian Connelly.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)