Introduction to ggplot2

[This article was first published on Blog on Data Solutions | Dedicated to helping businesses making data-driven decisions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

TL:DR:

If you are new to ggplot, welcome! If you are used to base R, its probably going to take awhile for you to get the hang of the syntax, but trust me, it’s worth it. GGplot is the tidyverse package for making graphics and you can control and customize pretty much every aspect. So let’s get started.

We are going to be working with the mtcars dataset, an oldie but a goodie. So we’ll take a look at the structure of the dataframe to see what we are working with.

library(ggplot2)
library(dplyr)

head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
str(mtcars)
## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

The Basics

To create any figure with ggplot you need to start off with the ggplot function. In this function, you include the dataset and the aesthetics section. The aesthetics is where you specify which columns you want as the x and y variables. After that, you need to add the type of plot you want to you create. For example, if I want to create a graph using points, I will add geom_point. All of the types of plots you can create follow this same syntax. Geom_whatever type of graph you want to make. If you want a line chart its geom_line, box and whisker plot is geom_boxplot.. etc etc.

Scatter plot

Let’s start off with a simple example, making a scatterplot. I am going to use the weight as the x -variable and drat as the y-variable. This is what a standard ggplot will look like.

ggplot(data = mtcars, aes(x = wt, y = drat)) + 
  geom_point()

Color by group

Now, we are going to begin to customize it. Let’s start by color coding the points. We want to be able to quickly identify which points are from cars with 6 cylinders and which are from cars with 4 cylinders. We are first going to set the cylinder column as a factor using the mutate function. We are then going to create a plot like we did before, with the weight as the x-variable and drat as the y-variable but we are going to add this third argument, color. We are setting the color to be equal to the cylinder level. One of the benefits of using ggplot is that it ties seamlessly in with the rest of the tidyverse functions. This makes it really easy to manipulate your data and get it set up in the format you need then send it straight into a plot and it knows to use the manipulated data.

mtcars %>% 
  mutate(cyl = as.factor(cyl)) %>% 
  ggplot(aes(x = wt, y = mpg, color = cyl)) + 
  geom_point() 

As you can probably tell, the style is very different from base r plots. GGplot plots are very customizable and there are many pre-made themes that you can use to change to style of the plot. You can also create your own custom theme and specify anything from the line size, font style and size of the labels, the gridlines, etc. I prefer to use the classic theme most of the time, it is most simplistic and minimal. All of the themes are called theme_name().

Classic
mtcars %>% 
  mutate(cyl = as.factor(cyl)) %>% 
  ggplot(aes(x = wt, y = mpg, color = cyl)) + 
  geom_point() +
  theme_classic()

Black and white
mtcars %>% 
  mutate(cyl = as.factor(cyl)) %>% 
  ggplot(aes(x = wt, y = mpg, color = cyl)) + 
  geom_point() + 
  theme_bw()

Dark
mtcars %>% 
  mutate(cyl = as.factor(cyl)) %>% 
  ggplot(aes(x = wt, y = mpg, color = cyl)) + 
  geom_point() + 
  theme_dark()

Multiple Datasets

Now I’ll show you how to create a plot with two different datasets. I’m going to first subset the rows of data for cars with 6 cylinders and name that cars.sub.

cars.sub <- mtcars %>% 
  filter(cyl == 6)

Then I’m going to make a scatterplot of weight and drat for all cars using geom_point(). Then I want to add a line for the data from our cars.sub dataframe. So I’ll add a new geom_line() function with the same x and y names but the data argument is going to be the name of the new dataframe. I also want to change the color of the line, so outside of the aesthetics function, I’ll add color = “orange”.

mtcars %>% 
  ggplot(aes(x = wt, y = drat)) +
  geom_point() +
  geom_line(aes(x = wt, y = drat), data = cars.sub, color = "orange")

Now we have a scatter plot and a line graph with data from 2 different datasets. If you wanted to use data from the same dataframe for the line part, you wouldn’t need to specify the data argument. Also, notice that the axis names default to the column names of the data you’re plotting. You can easily change these by using the labs function and just specify which axis (x or y) and the name.

mtcars %>% 
  ggplot(aes(x = wt, y = drat)) +
  geom_point() +
  geom_line(aes(x = wt, y = drat), data = cars.sub, color = "orange") +
  labs(x = "Weight", title = "2 Datasets Plot")

Facets

Lastly, I’m going to show you how to create side-by-side plots by groups in your data. I want to look at the number of cars that get the same mpg but I want to break them out by how many cylinders the car has. So I want the mpg on the x-axis and I’m going to use geom_histogram() because a histogram plots the number of times each event (which in this case is the mpg) occurs in the data. Then I am going to add a facet_wrap() and use the “~” symbol and the column I want it to group the data by.

mtcars %>% 
  ggplot(aes(x = mpg)) +
  geom_histogram() +
  facet_wrap(~cyl, scales = "free")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

And now we have 3 different plots, with the number of cylinders at the top. You can see that the ranges of mpg is pretty different between the three groups. Adding scales = “free” argument into the facet_wrap function puts each plot on its own x or y scale.

This is just the beginning of what you can do with ggplot and I’ll be posting more tutorials where I go more in-depth on some features soon!

To leave a comment for the author, please follow the link and comment on their blog: Blog on Data Solutions | Dedicated to helping businesses making data-driven decisions.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)