# Create a scatter plot with ggplot

**Quantargo Blog**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Make your first steps with the **ggplot2** package to create a scatter plot. Use the grammar-of-graphics to map data set attributes to your plot and connect different layers using the `+`

operator.

- Define a dataset for the plot using the
`ggplot()`

function - Specify a geometric layer using the
`geom_point()`

function - Map attributes from the dataset to plotting properties using the
`mapping`

parameter - Connect different
`ggplot`

objects using the`+`

operator

library(ggplot2) ggplot(___) + geom_point( mapping = aes(x = ___, y = ___) )

## Introduction to scatter plots

Scatter plots use points to visualize the relationship between two numeric variables. The position of each point represents the value of the variables on the x- and y-axis. Let’s see an example of a scatter plot to understand the relationship between the *speed* and the *stopping distance* of cars:

Each point represents a car. Each car starts to break at a speed given on the y-axis and travels the distance shown on the x-axis until full stop. If we take a look at all points in the plot, we can clearly see that it takes faster cars a longer distance until they are completely stopped.

## Quiz: Scatter Plot Facts

Which of the following statements about scatter plots are correct?

- Scatter plots visualize the relation of two numeric variables
- In a scatter plot we only interpret single points and never the relationship between the variables in general
- Scatter plots use points to visualize observations
- Scatter plots visualize the relation of categorical and numeric variables

## Specifying a dataset

library(ggplot2) ggplot(___) + geom_point( mapping = aes(x = ___, y = ___) )

To create plots with **ggplot2** you first need to load the package using `library(ggplot2)`

.

After the package has been loaded specify the dataset to be used as an argument of the `ggplot()`

function. For example, to specify a plot using the `cars`

dataset you can use:

library(ggplot2) ggplot(cars)

Note that this command does not plot anything but a grey canvas yet. It just defines the dataset for the plot and creates an empty base on top of which we can add additional layers.

## Exercise: Specify the gapminder dataset

To start with a ggplot visualizing the `gapminder`

dataset we need to:

- Load the
**ggplot2**package - Load the
**gapminder**package - Define the
`gapminder`

dataset to be used in the plot with the`ggplot()`

function

## Specifying a geometric layer

library(ggplot2) ggplot(___) + geom_point( mapping = aes(x = ___, y = ___) )

We can use **ggplot**’s geometric layers (or *geoms*) to define how we want to visualize our dataset. *Geoms* use geometric objects to visualize the variables of a dataset. The objects can have multiple forms like points, lines and bars and are specified through the corresponding functions `geom_point()`

, `geom_line()`

and `geom_col()`

:

## Quiz: Scatter Plot Layers

Which geometric layer should be used to create scatter plots in **ggplot2**?

`point_geom()`

`geom()`

`geom_scatter()`

`geom_point()`

## Creating aesthetic mappings

library(ggplot2) ggplot(___) + geom_point( mapping = aes(x = ___, y = ___) )

**ggplot2** uses the concept of *aesthetics*, which *map* dataset attributes to the visual features of the plot. Each geometric layer requires a different set of *aesthetic mappings*, e.g. the `geom_point()`

function uses the aesthetics `x`

and `y`

to determine the x- and y-axis coordinates of the points to plot. The aesthetics are mapped within the `aes()`

function to construct the final mappings.

To specify a layer of points which plots the variable `speed`

on the x-axis and distance `dist`

on the y-axis we can write:

geom_point( mapping = aes(x=speed, y=dist) )

The expression above constructs a geometric layer. However, this layer is currently not linked to a dataset and does not produce a plot. To **link** the layer with a `ggplot`

object specifying the `cars`

dataset we need to connect the `ggplot(cars)`

object with the `geom_point()`

layer using the `+`

operator:

ggplot(cars) + geom_point( mapping = aes(x=speed, y=dist) )

Through the linking `ggplot()`

knows that the mapped `speed`

and `dist`

variables are taken from the `cars`

dataset. `geom_point()`

instructs ggplot to plot the mapped variables as points.

The required steps to create a scatter plot with `ggplot`

can be summarized as follows:

- Load the package
**ggplot2**using`library(ggplot2)`

. - Specify the dataset to be plotted using
`ggplot()`

. - Use the
`+`

operator to add layers to the plot. - Add a geometric layer to define the shapes to be plotted. In case of scatter plots, use
`geom_point()`

. - Map variables from the dataset to plotting properties through the
`mapping`

parameter in the geometric layer.

## Exercise: Visualize the “cars” dataset

Create a scatter plot using `ggplot()`

and visualize the `cars`

dataset with the car’s stopping distance `dist`

on the x-axis and the `speed`

of the car on the y-axis.

The **ggplot2** package is already loaded. Follow these steps to create the plot:

- Specify the dataset through the
`ggplot()`

function - Specify a geometric point layer with the
`geom_point()`

function - Map the
`speed`

to the x-axis and the`dist`

to the y-axis with`aes()`

## Exercise: Visualize the Gapminder dataset

Create a scatter plot using `ggplot()`

and visualize the `gapminder_2007`

dataset with the GDP per capita `gdpPercap`

on the x-axis and the life expectancy `lifeExp`

of each country on the y-axis.

The **ggplot2** package is already loaded. Follow these steps to create the plot:

- Specify the
`gapminder_2007`

dataset through the`ggplot()`

function - Specify a geometric point layer with
`geom_point()`

. - Map the
`gdpPercap`

to the x-axis and the`lifeExp`

to the y-axis with`aes()`

Create a scatter plot with ggplot is an excerpt from the course Introduction to R, which is available for free at quantargo.com

**leave a comment**for the author, please follow the link and comment on their blog:

**Quantargo Blog**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.