Make your first steps with the ggplot2 package to create a scatter plot. Use the grammar-of-graphics to map data set attributes to your plot and connect different layers using the `+` operator.

• Define a dataset for the plot using the `ggplot()` function
• Specify a geometric layer using the `geom_point()` function
• Map attributes from the dataset to plotting properties using the `mapping` parameter
• Connect different `ggplot` objects using the `+` operator
```library(ggplot2)
ggplot(___) +
geom_point(
mapping = aes(x = ___, y = ___)
)```

## Introduction to scatter plots

Scatter plots use points to visualize the relationship between two numeric variables. The position of each point represents the value of the variables on the x- and y-axis. Let’s see an example of a scatter plot to understand the relationship between the speed and the stopping distance of cars:

Each point represents a car. Each car starts to break at a speed given on the y-axis and travels the distance shown on the x-axis until full stop. If we take a look at all points in the plot, we can clearly see that it takes faster cars a longer distance until they are completely stopped.

## Quiz: Scatter Plot Facts

Which of the following statements about scatter plots are correct?

• Scatter plots visualize the relation of two numeric variables
• In a scatter plot we only interpret single points and never the relationship between the variables in general
• Scatter plots use points to visualize observations
• Scatter plots visualize the relation of categorical and numeric variables

## Specifying a dataset

```library(ggplot2)
ggplot(___) +
geom_point(
mapping = aes(x = ___, y = ___)
)```

To create plots with ggplot2 you first need to load the package using `library(ggplot2)`.

After the package has been loaded specify the dataset to be used as an argument of the `ggplot()` function. For example, to specify a plot using the `cars` dataset you can use:

```library(ggplot2)
ggplot(cars)```

Note that this command does not plot anything but a grey canvas yet. It just defines the dataset for the plot and creates an empty base on top of which we can add additional layers.

## Exercise: Specify the gapminder dataset

To start with a ggplot visualizing the `gapminder` dataset we need to:

3. Define the `gapminder` dataset to be used in the plot with the `ggplot()` function

## Specifying a geometric layer

```library(ggplot2)
ggplot(___) +
geom_point(
mapping = aes(x = ___, y = ___)
)```

We can use ggplot’s geometric layers (or geoms) to define how we want to visualize our dataset. Geoms use geometric objects to visualize the variables of a dataset. The objects can have multiple forms like points, lines and bars and are specified through the corresponding functions `geom_point()`, `geom_line()` and `geom_col()`:

## Quiz: Scatter Plot Layers

Which geometric layer should be used to create scatter plots in ggplot2?

• `point_geom()`
• `geom()`
• `geom_scatter()`
• `geom_point()`

## Creating aesthetic mappings

```library(ggplot2)
ggplot(___) +
geom_point(
mapping = aes(x = ___, y = ___)
)```

ggplot2 uses the concept of aesthetics, which map dataset attributes to the visual features of the plot. Each geometric layer requires a different set of aesthetic mappings, e.g. the `geom_point()` function uses the aesthetics `x` and `y` to determine the x- and y-axis coordinates of the points to plot. The aesthetics are mapped within the `aes()` function to construct the final mappings.

To specify a layer of points which plots the variable `speed` on the x-axis and distance `dist` on the y-axis we can write:

```geom_point(
mapping = aes(x=speed, y=dist)
)```

The expression above constructs a geometric layer. However, this layer is currently not linked to a dataset and does not produce a plot. To link the layer with a `ggplot` object specifying the `cars` dataset we need to connect the `ggplot(cars)` object with the `geom_point()` layer using the `+` operator:

```ggplot(cars) +
geom_point(
mapping = aes(x=speed, y=dist)
)```

Through the linking `ggplot()` knows that the mapped `speed` and `dist` variables are taken from the `cars` dataset. `geom_point()` instructs ggplot to plot the mapped variables as points.

The required steps to create a scatter plot with `ggplot` can be summarized as follows:

1. Load the package ggplot2 using `library(ggplot2)`.
2. Specify the dataset to be plotted using `ggplot()`.
3. Use the `+` operator to add layers to the plot.
4. Add a geometric layer to define the shapes to be plotted. In case of scatter plots, use `geom_point()`.
5. Map variables from the dataset to plotting properties through the `mapping` parameter in the geometric layer.

## Exercise: Visualize the “cars” dataset

Create a scatter plot using `ggplot()` and visualize the `cars` dataset with the car’s stopping distance `dist` on the x-axis and the `speed` of the car on the y-axis.

1. Specify the dataset through the `ggplot()` function
2. Specify a geometric point layer with the `geom_point()` function
3. Map the `speed` to the x-axis and the `dist` to the y-axis with `aes()`

## Exercise: Visualize the Gapminder dataset

Create a scatter plot using `ggplot()` and visualize the `gapminder_2007` dataset with the GDP per capita `gdpPercap` on the x-axis and the life expectancy `lifeExp` of each country on the y-axis.

1. Specify the `gapminder_2007` dataset through the `ggplot()` function
2. Specify a geometric point layer with `geom_point()`.
3. Map the `gdpPercap` to the x-axis and the `lifeExp` to the y-axis with `aes()`

