Split violin plots

[This article was first published on Ecology in silico, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Violin plots are useful for comparing distributions. When data are
grouped by a factor with two levels (e.g. males and females), you can
split the violins in half to see the difference between groups. Consider
a 2 x 2 factorial experiment: treatments A and B are crossed with groups
1 and 2, with N=1000.

1
2
3
4
5
6
7
8
9
<span class="line"><span class="c1"># Simulate data</span>
</span><span class="line">n.each <span class="o"><-</span> <span class="m">1000</span>
</span><span class="line">A1 <span class="o"><-</span> rnorm<span class="p">(</span>n.each<span class="p">,</span> <span class="m">2</span><span class="p">,</span> <span class="m">1</span><span class="p">)</span>
</span><span class="line">A2 <span class="o"><-</span> rnorm<span class="p">(</span>n.each<span class="p">,</span> <span class="m">1.5</span><span class="p">,</span> <span class="m">2</span><span class="p">)</span>
</span><span class="line">B1 <span class="o"><-</span> rnorm<span class="p">(</span>n.each<span class="p">,</span> <span class="m">4</span><span class="p">,</span> <span class="m">1.5</span><span class="p">)</span>
</span><span class="line">B2 <span class="o"><-</span> rnorm<span class="p">(</span>n.each<span class="p">,</span> <span class="m">0</span><span class="p">,</span> <span class="m">1</span><span class="p">)</span>
</span><span class="line">values <span class="o"><-</span> c<span class="p">(</span>A1<span class="p">,</span> A2<span class="p">,</span> B1<span class="p">,</span> B2<span class="p">)</span>
</span><span class="line">treatment <span class="o"><-</span> rep<span class="p">(</span>c<span class="p">(</span><span class="s">"A"</span><span class="p">,</span> <span class="s">"B"</span><span class="p">),</span> each<span class="o">=</span>n.each<span class="o">*</span><span class="m">2</span><span class="p">)</span>
</span><span class="line">group <span class="o"><-</span> rep<span class="p">(</span>c<span class="p">(</span><span class="m">1</span><span class="p">,</span> <span class="m">2</span><span class="p">,</span> <span class="m">1</span><span class="p">,</span> <span class="m">2</span><span class="p">),</span> each<span class="o">=</span>n.each<span class="p">)</span>
</span>

Boxplots are often used:

1
2
<span class="line">par<span class="p">(</span>bty<span class="o">=</span><span class="s">"n"</span><span class="p">)</span>
</span><span class="line">boxplot<span class="p">(</span>values <span class="o">~</span> group<span class="o">*</span>treatment<span class="p">,</span> main<span class="o">=</span><span class="s">"Box plot"</span><span class="p">,</span> col<span class="o">=</span>rep<span class="p">(</span>c<span class="p">(</span><span class="s">"purple"</span><span class="p">,</span> <span class="s">"lightblue"</span><span class="p">),</span> <span class="m">2</span><span class="p">))</span>
</span>

This gives us a rough comparison of the distribution in each group,
but sometimes it’s nice to visualize the kernel density estimates instead.

I recently ran into this issue and tweaked the vioplot() function from
the vioplot
package by Daniel Adler to make split violin plots.
With vioplot2(), the side
argument specifies whether to plot the density on “both”, the “left”, or
the “right” side.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
<span class="line">require<span class="p">(</span>vioplot<span class="p">)</span>
</span><span class="line">require<span class="p">(</span>devtools<span class="p">)</span>
</span><span class="line">require<span class="p">(</span>digest<span class="p">)</span>
</span><span class="line">source_gist<span class="p">(</span><span class="s">"https://gist.github.com/mbjoseph/5852613"</span><span class="p">)</span>
</span><span class="line">plot<span class="p">(</span>x<span class="o">=</span><span class="kc">NULL</span><span class="p">,</span> y<span class="o">=</span><span class="kc">NULL</span><span class="p">,</span>
</span><span class="line">     xlim <span class="o">=</span> c<span class="p">(</span><span class="m">0.5</span><span class="p">,</span> <span class="m">2.5</span><span class="p">),</span> ylim<span class="o">=</span>c<span class="p">(</span>min<span class="p">(</span>values<span class="p">),</span> max<span class="p">(</span>values<span class="p">)),</span>
</span><span class="line">     type<span class="o">=</span><span class="s">"n"</span><span class="p">,</span> ann<span class="o">=</span><span class="kc">FALSE</span><span class="p">,</span> axes<span class="o">=</span><span class="k-Variable">F</span><span class="p">)</span>
</span><span class="line">axis<span class="p">(</span><span class="m">1</span><span class="p">,</span> at<span class="o">=</span>c<span class="p">(</span><span class="m">1</span><span class="p">,</span> <span class="m">2</span><span class="p">),</span>  labels<span class="o">=</span>c<span class="p">(</span><span class="s">"A"</span><span class="p">,</span> <span class="s">"B"</span><span class="p">))</span>
</span><span class="line">axis<span class="p">(</span><span class="m">2</span><span class="p">)</span>
</span><span class="line"><span class="kr">for</span> <span class="p">(</span>i <span class="kr">in</span> unique<span class="p">(</span>treatment<span class="p">))</span> <span class="p">{</span>
</span><span class="line">  <span class="kr">for</span> <span class="p">(</span>j <span class="kr">in</span> unique<span class="p">(</span>group<span class="p">)){</span>
</span><span class="line">    vioplot2<span class="p">(</span>values<span class="p">[</span>which<span class="p">(</span>treatment <span class="o">==</span> i <span class="o">&</span> group <span class="o">==</span> j<span class="p">)],</span>
</span><span class="line">             at <span class="o">=</span> ifelse<span class="p">(</span>i <span class="o">==</span> <span class="s">"A"</span><span class="p">,</span> <span class="m">1</span><span class="p">,</span> <span class="m">2</span><span class="p">),</span>
</span><span class="line">             side <span class="o">=</span> ifelse<span class="p">(</span>j <span class="o">==</span> <span class="m">1</span><span class="p">,</span> <span class="s">"left"</span><span class="p">,</span> <span class="s">"right"</span><span class="p">),</span>
</span><span class="line">             col <span class="o">=</span> ifelse<span class="p">(</span>j <span class="o">==</span> <span class="m">1</span><span class="p">,</span> <span class="s">"purple"</span><span class="p">,</span> <span class="s">"lightblue"</span><span class="p">),</span>
</span><span class="line">             add <span class="o">=</span> <span class="k-Variable">T</span><span class="p">)</span>
</span><span class="line">  <span class="p">}</span>
</span><span class="line"><span class="p">}</span>
</span><span class="line">title<span class="p">(</span><span class="s">"Violin plot"</span><span class="p">,</span> xlab<span class="o">=</span><span class="s">"Treatment"</span><span class="p">)</span>
</span><span class="line">legend<span class="p">(</span><span class="s">"bottomright"</span><span class="p">,</span> fill <span class="o">=</span> c<span class="p">(</span><span class="s">"purple"</span><span class="p">,</span> <span class="s">"lightblue"</span><span class="p">),</span>
</span><span class="line">       legend <span class="o">=</span> c<span class="p">(</span><span class="s">"Group 1"</span><span class="p">,</span> <span class="s">"Group 2"</span><span class="p">),</span> box.lty<span class="o">=</span><span class="m">0</span><span class="p">)</span>
</span>

Last but not least, Peter Kampstra’s
beanplot
package uses beanplot() to make split
density plots, but 1) plots a rug rather
than a quantile box, 2) includes a line for the overall mean or median,
and 3) makes it easier to change the kernel function.

1
2
3
4
5
6
7
8
9
<span class="line">require<span class="p">(</span>beanplot<span class="p">)</span>
</span><span class="line">beanplot<span class="p">(</span>values <span class="o">~</span> group<span class="o">*</span>treatment<span class="p">,</span> ll <span class="o">=</span> <span class="m">0.04</span><span class="p">,</span>
</span><span class="line">         main <span class="o">=</span> <span class="s">"Bean plot"</span><span class="p">,</span> side <span class="o">=</span> <span class="s">"both"</span><span class="p">,</span> xlab<span class="o">=</span><span class="s">"Treatment"</span><span class="p">,</span>
</span><span class="line">         col <span class="o">=</span> list<span class="p">(</span><span class="s">"purple"</span><span class="p">,</span> c<span class="p">(</span><span class="s">"lightblue"</span><span class="p">,</span> <span class="s">"black"</span><span class="p">)),</span>
</span><span class="line">         axes<span class="o">=</span><span class="k-Variable">F</span><span class="p">)</span>
</span><span class="line">axis<span class="p">(</span><span class="m">1</span><span class="p">,</span> at<span class="o">=</span>c<span class="p">(</span><span class="m">1</span><span class="p">,</span> <span class="m">2</span><span class="p">),</span>  labels<span class="o">=</span>c<span class="p">(</span><span class="s">"A"</span><span class="p">,</span> <span class="s">"B"</span><span class="p">))</span>
</span><span class="line">axis<span class="p">(</span><span class="m">2</span><span class="p">)</span>
</span><span class="line">legend<span class="p">(</span><span class="s">"bottomright"</span><span class="p">,</span> fill <span class="o">=</span> c<span class="p">(</span><span class="s">"purple"</span><span class="p">,</span> <span class="s">"lightblue"</span><span class="p">),</span>
</span><span class="line">       legend <span class="o">=</span> c<span class="p">(</span><span class="s">"Group 1"</span><span class="p">,</span> <span class="s">"Group 2"</span><span class="p">),</span> box.lty<span class="o">=</span><span class="m">0</span><span class="p">)</span>
</span>

There are
more
ways
than
one
to
skin
a
cat,
and what one uses will probably come to personal preference.

To leave a comment for the author, please follow the link and comment on their blog: Ecology in silico.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)