Charting the CMV Awareness Gap

[This article was first published on Artful Analytics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Sometimes it’s okay to use a secondary axis


Speaking of viruses, did you know that June is National Cytomegalovirus
(CMV) Awareness Month
Probably not, since most people have never heard of CMV (hence the need
for a national awareness month).

CMV is a common virus that infects 50-80% of people by the time they are
40 years old. In most cases, it’s not a big deal. But if a pregnant
woman becomes infected, she can pass the virus to the unborn child,
which results in a congenital infection about 33% of the time.

Congenital CMV (cCMV) is the number one viral cause of birth defects in
children. According to National CMV
, 1 in 200 children
are born with CMV every year. That’s roughly 30,000 children. About 1 in
5 children born with CMV infection will have moderate to severe health
problems including:

  • Hearing loss
  • Vision loss
  • Feeding issues
  • Intellectual disability
  • Microcephaly (small head or brain)
  • Cerebral Palsy
  • Seizures

Outcomes associated
with congenital CMV are wide-ranging and unpredictable.

Despite how common and potentially damaging CMV is, research shows that
only 9% of women have heard of the condition.

Awareness = prevention

Our son

was born with congenital CMV in 2013. Like most parents, we had never
heard of cCMV until our son was diagnosed.

Because cCMV is a viral infection, it is potentially preventable during
pregnancy if you know to take certain basic
However, knowing to take precations requires having heard of the
condition in the first place
, which brings us back to the need for a
National CMV Awareness Month.

One of the main tactics used in CMV awareness raising efforts is to
highlight the “awareness gap” between how few women have heard of CMV
and how many children are disabled by the condition each year.

In the past, the National CMV Foundation has used the graphic below for
this purpose (Fig. 1). It nicely shows levels of awareness vs incidence
of various congenital conditions in the US, based on data from Doutre
et al. (2016)

Fig. 1

Recently, I was asked by the Foundation to revise this graphic to
enhance its effectiveness (not coincidentally, my
is the Chair of the Scientific
Advisory Committee).

In this post, I describe my approach using
ggplot2, as well as
cowplot and
related pacakges in R.

Mind the gap

Technically speaking, Fig. 1 is what you would call a bi-directional,
mirrored, diverging, or back-to-back bar chart. It is reminiscent of
pyramid style bar
charts often used to visualize population age distributions.

I suspect that when people see Fig. 1 they have a perceptual tendency to
sum the bars together rather than take the difference between each bar.
The former is typically how a bi-directional bar chart would be
interpreted. But since the purpose of the visualization is to highlight
the CMV awareness gap, it might be better to actually plot the gap
(linear distance) between awareness and incidence of long-term health problems for cCMV in comparison
to other conditions.

So my proposed enhancement is to layer the incidence data as a series of
dots on top of an ordered bar chart representing increasing awareness on
the x-axis, and use a secondary x-axis for incidence. Layering in this
way will create a visually salient gap between awareness and incidence
for cCMV at the top of the chart, which I can further highlight with
some text annotations.

Secondary axis (of evil?)

Early versions of {ggplot2} did not include the ability to add a
secondary axis because Hadley
believed (and probably still
believes) that using a separate, secondary axis is a fundamentally


However, more recent versions of the package have included this
functionality with the sec_axis() function described
here. I think
we can assume from the addition of this functionality that Hadley isn’t
completely averse to the use of a secondary axis in some situations when
used with caution.

Again, my rationale for using a secondary x-axis in this case is to
achieve a specific perceptual effect, to higlight the gap between cCMV
awareness and incidence of disability visually so that people viewing the chart will
say “Wow! That’s some big gap.” And I think I can achieve this without
being manipulative or misleading, becuase the gap really is quite big.

Without further ado…

Here’s how the chart looks (Fig. 2). You can download a high
resolution version here.

Fig. 2

And here’s the R code that produces the chart.


# Get data from Doutre et al.

df <- tribble(
  ~condition, ~awareness, ~incidence,
  "Congenital Cytomegalovirus (CMV)", 6.7, 6000,
  "Congenital Toxoplasmosis", 8.53, 400,
  "Congenital Rubella Syndrome", 13.27, 3,
  "Beta Strep (Group B Strep)", 16.91, 380,
  "Parvovirus B19 (Fifth Disease)", 19.63, 1045,
  "Fetal Alcohol Syndrome", 61.04, 1200,
  "Spina Bifida", 64.54, 1500,
  "Sudden Infant Death Syndrome (SIDS)", 78.7, 1500,
  "Down Syndrome", 85.44, 6000,
  "Congenital HIV/AIDS", 86.33, 30

# Get National CMV logo

logo <- image_read("")

# Create chart

p <- df %>%
  ggplot(aes(x = reorder(condition, desc(awareness)), y = awareness)) +
  geom_col(fill = "#28C1DB") +
    aes(x = condition, y = incidence / 70),
    size = 4,
    pch = 21,
    fill = "#FB791A"
  ) +
    sec.axis = sec_axis(
      ~ . * 70, 
      name = "Number of Children Born with the Condition Each Year (Dots)",
      labels = scales::comma_format()
  ) +
  coord_flip() +
    x = "",
    y = "Percentage of Women Who Have Heard of the Condition (Bars)",
    title = "Awareness vs Incidence of Congenital Conditions",
    caption = "Based on US data from Doutré SM *et al.* (2016) Losing Ground: Awareness of Congenital Cytomegalovirus 
    in the United States. *Journal of Early Hearing Detection and Intervention* 1:39-48. Chart by Artful Analytics, 
    LLC (@_sethdobson). 
For more information, visit" ) + theme_bw() + theme( plot.title = element_text(face = "bold", hjust = .5), plot.caption = element_textbox_simple(size = 6, margin = margin(10, 0, 0, 0)), axis.text = element_text(color = "black"), axis.title = element_text(size = 10) ) + background_grid(major = "none") + annotate( geom = "text", label = "Number of children\nborn with CMV", x = 7.8, y = 75, color = "#FB791A", size = 3 ) + annotate( geom = "curve", x = 8.5, y = 75, xend = 10, yend = 84, curvature = -.3, arrow = arrow(length = unit(2, "mm")), color = "#FB791A" ) + annotate( geom = "text", label = "% of women who have\nheard of CMV", x = 7.8, y = 30, color = "#28C1DB", size = 3 ) + annotate( geom = "curve", x = 8.5, y = 30, xend = 10, yend = 7, curvature = .20, arrow = arrow(length = unit(2, "mm")), color = "#28C1DB" ) # Combine chart with logo ggdraw() + draw_plot(p) + draw_image( logo, x = .075, y = .1, scale = .2, hjust = .5, vjust = .5 )

A few things to note about the code above:

  • The secondary x-axis is actually coded as a secondary y-axis since
    you have to use coord_flip() to get the categorical variable on the y-axis
    when using geom_col().
  • The sec_axis() function is used in conjuction with the sec.axis
    option within scale_y_continuous(). In order to align the two
    y-axes, I multiplied the secondary axis by 70 within sec_axis()
    and divided incidence by 70 within the aesthetics of geom_point().
    I arrived at the number 70 by trial and error. Not sure why this works,
    but it does.
  • I used the ColorZilla Google
    Chrome extension to get hex color values from the National CMV logo.
    That way I was able to match the colors in the logo to the chart
    elements without a lot of guesswork.
  • I am using the amazing
    ggtext package by Claus Wilke to render the
    plot.caption theme element in markdown, so I can easily italicize
    selected words with asterisks. The element_textbox_simple() function from
    {ggtext} also does word wrapping automatically.


Hopefully you will agree that my combination bar and dot chart (Fig. 2) is an
improvement on the original graph (Fig. 1) in that it highlights the CMV
awareness gap more effectively for a general audience. I also trust that
Hadley would agree that this is an acceptable use of a secondary axis.
Altough, he might not. So nobody tell him, OK?

To learn more about congenital CMV visit

Questions or comments?

Feel free to reach out to me at any of the social links below.

For more R content, please visit
R-bloggers and

To leave a comment for the author, please follow the link and comment on their blog: Artful Analytics. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)