Create a scatter plot with ggplot

[This article was first published on Quantargo Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Make your first steps with the ggplot2 package to create a scatter plot. Use the grammar-of-graphics to map data set attributes to your plot and connect different layers using the + operator.

  • Define a dataset for the plot using the ggplot() function
  • Specify a geometric layer using the geom_point() function
  • Map attributes from the dataset to plotting properties using the mapping parameter
  • Connect different ggplot objects using the + operator
library(ggplot2)
ggplot(___) + 
  geom_point(
    mapping = aes(x = ___, y = ___)
  )

Introduction to scatter plots

Scatter plots use points to visualize the relationship between two numeric variables. The position of each point represents the value of the variables on the x- and y-axis. Let’s see an example of a scatter plot to understand the relationship between the speed and the stopping distance of cars:

Each point represents a car. Each car starts to break at a speed given on the y-axis and travels the distance shown on the x-axis until full stop. If we take a look at all points in the plot, we can clearly see that it takes faster cars a longer distance until they are completely stopped.

Quiz: Scatter Plot Facts

Which of the following statements about scatter plots are correct?

  • Scatter plots visualize the relation of two numeric variables
  • In a scatter plot we only interpret single points and never the relationship between the variables in general
  • Scatter plots use points to visualize observations
  • Scatter plots visualize the relation of categorical and numeric variables

Start Quiz

Specifying a dataset

library(ggplot2)
ggplot(___) + 
  geom_point(
    mapping = aes(x = ___, y = ___)
  )

To create plots with ggplot2 you first need to load the package using library(ggplot2).

After the package has been loaded specify the dataset to be used as an argument of the ggplot() function. For example, to specify a plot using the cars dataset you can use:

library(ggplot2)
ggplot(cars)

Note that this command does not plot anything but a grey canvas yet. It just defines the dataset for the plot and creates an empty base on top of which we can add additional layers.

Exercise: Specify the gapminder dataset

To start with a ggplot visualizing the gapminder dataset we need to:

  1. Load the ggplot2 package
  2. Load the gapminder package
  3. Define the gapminder dataset to be used in the plot with the ggplot() function

Start Exercise

Specifying a geometric layer

library(ggplot2)
ggplot(___) + 
  geom_point(
    mapping = aes(x = ___, y = ___)
  )

We can use ggplot’s geometric layers (or geoms) to define how we want to visualize our dataset. Geoms use geometric objects to visualize the variables of a dataset. The objects can have multiple forms like points, lines and bars and are specified through the corresponding functions geom_point(), geom_line() and geom_col():

Quiz: Scatter Plot Layers

Which geometric layer should be used to create scatter plots in ggplot2?

  • point_geom()
  • geom()
  • geom_scatter()
  • geom_point()

Start Quiz

Creating aesthetic mappings

library(ggplot2)
ggplot(___) + 
  geom_point(
    mapping = aes(x = ___, y = ___)
  )

ggplot2 uses the concept of aesthetics, which map dataset attributes to the visual features of the plot. Each geometric layer requires a different set of aesthetic mappings, e.g. the geom_point() function uses the aesthetics x and y to determine the x- and y-axis coordinates of the points to plot. The aesthetics are mapped within the aes() function to construct the final mappings.

To specify a layer of points which plots the variable speed on the x-axis and distance dist on the y-axis we can write:

geom_point(
  mapping = aes(x=speed, y=dist)
)

The expression above constructs a geometric layer. However, this layer is currently not linked to a dataset and does not produce a plot. To link the layer with a ggplot object specifying the cars dataset we need to connect the ggplot(cars) object with the geom_point() layer using the + operator:

ggplot(cars) + 
  geom_point(
    mapping = aes(x=speed, y=dist)
  )

Through the linking ggplot() knows that the mapped speed and dist variables are taken from the cars dataset. geom_point() instructs ggplot to plot the mapped variables as points.

The required steps to create a scatter plot with ggplot can be summarized as follows:

  1. Load the package ggplot2 using library(ggplot2).
  2. Specify the dataset to be plotted using ggplot().
  3. Use the + operator to add layers to the plot.
  4. Add a geometric layer to define the shapes to be plotted. In case of scatter plots, use geom_point().
  5. Map variables from the dataset to plotting properties through the mapping parameter in the geometric layer.

Exercise: Visualize the “cars” dataset

Create a scatter plot using ggplot() and visualize the cars dataset with the car’s stopping distance dist on the x-axis and the speed of the car on the y-axis.

The ggplot2 package is already loaded. Follow these steps to create the plot:

  1. Specify the dataset through the ggplot() function
  2. Specify a geometric point layer with the geom_point() function
  3. Map the speed to the x-axis and the dist to the y-axis with aes()

Start Exercise

Exercise: Visualize the Gapminder dataset

Create a scatter plot using ggplot() and visualize the gapminder_2007 dataset with the GDP per capita gdpPercap on the x-axis and the life expectancy lifeExp of each country on the y-axis.

The ggplot2 package is already loaded. Follow these steps to create the plot:

  1. Specify the gapminder_2007 dataset through the ggplot() function
  2. Specify a geometric point layer with geom_point().
  3. Map the gdpPercap to the x-axis and the lifeExp to the y-axis with aes()

Start Exercise

Create a scatter plot with ggplot is an excerpt from the course Introduction to R, which is available for free at quantargo.com

VIEW FULL COURSE

To leave a comment for the author, please follow the link and comment on their blog: Quantargo Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)