gghalves: Make Half Boxplot | Half Dotplot Visualizations with ggplot2
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
This article is part of R-Tips Weekly, a weekly video tutorial that shows you step-by-step how to do common R coding tasks.
Here are the links to get set up. ?
What is gghalves?
gghalves
is a new R package that makes it easy to compose your own half-plots using ggplot2
.
gghalves Video Tutorial
For those that prefer Full YouTube Video Tutorials.
Learn how to use gghalves
in our free 8-minute YouTube video.
Watch our full YouTube Tutorial
What are Half Plots?
Combining two plots side-by-side.
Half/Half Plots are a way to showcase two plots side-by-side. Here’s a common example:
-
Showing a Boxplot to identify outliers and quantiles
-
Showing a Dotplot to identify distribution
We can easily do this with a half-plot thanks to gghalves
.
Before we get started, get the R Cheat Sheet
gghalves
is great for making customized ggplot2
plots. But, you’ll still need to learn how to wrangle data with dplyr
and visualize data with ggplot2
. For those topics, I’ll use the Ultimate R Cheat Sheet to refer to dplyr
and ggplot2
code in my workflow.
Quick Example:
Download the Ultimate R Cheat Sheet Then Click the “CS” next to “ggplot2” opens the Data Visualization with GGplot2 Cheat Sheet.
Now you’re ready to quickly reference ggplot2
functions.
Onto the tutorial.
How gghalves works
The gghalves
package extends ggplot2
by adding several new “geoms” (ggplot geometries) that allow us to add half plots. In this tutorial, we’ll cover:
geom_half_boxplot()
: For creating half-boxplotsgeom_half_dotplot()
: For creating half-dotplots
Pro Tip:
Simply type “geom_half” in your R console and hit Tab to show all of the half plotting geoms available.
Load the Libraries and Data
First, run this code to:
- Load Libraries: Load
gghalves
,tidyverse
andtidyquant
. - Import Data: We’re using the
mpg
dataset that comes withggplot2
.
Make the Half-Boxplot / Half-Dotplot
Next, we can combine a half-boxplot and half-dotplot. This has the advantage of showing:
- Quantiles and Outliers (Boxplot)
- Distribution (Dotplot)
Business Goal
Suppose we have a question:
What effect does Engine Size (number of Cylinders) have on Vehicle Highway Fuel Economy (Highway MPG)?
We can visualize this with gghalves
by making half-plots of Cylinder vs Highway.
Half-Plot Visualization Code
Using the Ultimate R Cheat Sheet, we can make a ggplot
from the ggplot2 data visualization cheat sheet. We’ll add geom_half_boxplot()
and geom_half_dotplot()
to make the half-plots of Cylinder vs Highway.
Half-Plot Visualization
Here is the visualization. We can explore to find an interesting relationship between Engine Size and Fuel Economy.
Insights: Bimodal Distribution of 6-Cylinder Engine Class
Generally speaking, fuel economy goes down as engine size increases. But, the 6-Cylinder engine has something unique going on that has been uncovered by the gghalves::geom_half_dotplot()
.
The 6-Cylinder Engine class of car has a bimodal distribution, which is when there are two peaks. This generally indicates that there are two different populations within the group. We need to investigate with ggplot2
.
Exploring the Bimodal Relationship
We can explore the 6 Cylinder Vehicle Class a bit further to identify the cause of the Bimodal Distribution. It looks like:
- SUV and Pickup classes have much lower fuel economy
- Compact, Midsize, Minivan, and Subcompact have much higher fuel economy
Why Learning ggplot2 is essential
I wouldn’t be nearly as effective as a data scientist without knowing ggplot2
. In fact, data visualization has been one of two skills that have been critical to my career (with the other one being data transformation).
Case Study: This tutorial showcases exactly why visualization is important
Let’s just take this tutorial as a case study. Without being able to visualize with ggplot2
:
-
We wouldn’t be able to visually identify the Bimodal Distribution. We needed to see that to know to explore the 6-Cylinder Engine Class.
-
We wouldn’t have been able to explore the 6-Cylinder Engine Class. This showed us the importance of the Vehicle Class (e.g. SUV, Pickups being lower and Compact, Subcompact being higher in fuel economy).
Career Tip: Learn ggplot2
If I had one piece of advice, it would be to start learning ggplot2
. Let me explain.
Learning ggplot2
helped me to:
- Explain complex topics to non-technical people
- Develop good reports that showcased important points visually
- Make persuasive arguments that got the attention of Senior Management and even my CEO
So, yes, learning ggplot2
was absolutely essential to my career. I received many promotions and got the attention of my CEO using ggplot2
effectively.
If you’d like to learn ggplot2
and data science for business, then read on. ?
My Struggles with Learning Data Science
It took me a long time to learn data science. And I made a lot of mistakes as I fumbled through learning R. I specifically had a tough time navigating the ever increasing landscape of tools and packages, trying to pick between R and Python, and getting lost along the way.
If you feel like this, you’re not alone.
In fact, that’s the driving reason that I created Business Science and Business Science University (You can read about my personal journey here).
What I found out is that:
-
Data Science does not have to be difficult, it just has to be taught smartly
-
Anyone can learn data science fast provided they are motivated.
How I can help
If you are interested in learning R and the ecosystem of tools at a deeper level, then I have a streamlined program that will get you past your struggles and improve your career in the process.
It’s called the 5-Course R-Track System. It’s an integrated system containing 5 courses that work together on a learning path. Through 5+ projects, you learn everything you need to help your organization: from data science foundations, to advanced machine learning, to web applications and deployment.
The result is that you break through previous struggles, learning from my experience & our community of 2000+ data scientists that are ready to help you succeed.
Ready to take the next step? Then let’s get started.
? Top R-Tips Tutorials you might like:
- mmtable2: ggplot2 for tables
- ggside: Plot linear regression with marginal distributions
- DataEditR: Interactive Data Editing in R
- openxlsx: How to Automate Excel in R
- officer: How to Automate PowerPoint in R
- DataExplorer: Fast EDA in R
- esquisse: Interactive ggplot2 builder
- gghalves: Half-plots with ggplot2
- rmarkdown: How to Automate PDF Reporting
- patchwork: How to combine multiple ggplots
Want these tips every week? Join R-Tips Weekly.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.