Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. ggplot2 implements the grammar of graphics to map attributes from a data set to plot features through aesthetics. This framework can be used to adjust the point `size`, `color` and transparency `alpha` of points in a scatter plot.

• Adjust the point size of a scatter plot using the `size` parameter
• Change the point color of a scatter plot using the `color` parameter
• Set a parameter `alpha` to change the transparency of all points
• Differentiate between aesthetic mappings and constant parameters
```ggplot(___) +
geom_point(
mapping = aes(x = ___, y = ___,
color = ___,
size  = ___),
alpha  = ___
)```

In their most basic form scatter plots can only visualize datasets in two dimensions through the `x` and `y` aesthetics of the `geom_point()` layer. However, most data sets have more than two variables and thus might require additional plotting dimensions. `ggplot()` makes it very easy to map additional variables to different plotting aesthetics like `size`, transparency `alpha` and `color`.

Let’s consider the `gapminder_2007` dataset which contains the variables GDP per capita `gdpPercap` and life expectancy `lifeExp` for 142 countries in the year 2007:

```ggplot(gapminder_2007) +
geom_point(aes(x = gdpPercap, y = lifeExp))``` Mapping the `continent` variable through the point `color` aesthetic and the population `pop` (in millions) through the point `size` we obtain a much richer plot including 4 different variables from the data set: ## Quiz: geom_point() Aesthetics

Which aesthetics can be specified for `geom_point()`?
• `geom_line`
• `color`
• `point`
• `alpha`
• `size`
Start Quiz

```ggplot(___) +
geom_point(
mapping = aes(x = ___, y = ___,
color = ___,
size  = ___),
alpha  = ___
)```

Typically, the point color is used to introduce a new dimension to a scatter plot. In ggplot we use the `color` aesthetic to specify the mapping of a variable to the color of the points.

For the `gapminder_2007` dataset we can plot the GDP per capita `gdpPercap` vs. the life expectancy `lifeExp` as follows:

```ggplot(gapminder_2007) +
geom_point(aes(x = gdpPercap, y = lifeExp))``` To color each point based on the `continent` of each country we can use:

```ggplot(gapminder_2007) +
geom_point(aes(x = gdpPercap, y = lifeExp,
color = continent))``` We see that in the resulting plot each point is colored differently based on the `continent` of each country. `ggplot` uses the coloring scheme based on the categorical data type of the variable `continent`.

By contrast, let’s see how the plot looks like if we color the points by the `numeric` variable population `pop`:

```ggplot(gapminder_2007) +
geom_point(aes(x = gdpPercap, y = lifeExp,
color = pop))``` The scale immediately changes to continuous as it can be seen in the legend and the light-blue points are now the countries with the highest population number (China and India).

## Exercise: Reconstruct Gapminder graph

Reconstruct the following graph which shows the relationship between GDP per capita and life expectancy for the year 2007: 1. Use the `ggplot()` function and specify the `gapminder_2007` dataset as input
2. Add a `geom_point` layer to the plot and create a scatter plot showing the GDP per capita `gdpPercap` on the x-axis and the life expectancy `lifeExp` on the y-axis
3. Make the `color` aesthetic of the points unique for each `continent`
Start Exercise

## Exercise: Create a colored scatter plot with DavisClean

The `DavisClean` dataset contains the height and weight measurements of 199 people.

1. Use the `ggplot()` function and specify the `DavisClean` dataset as input
2. Add a `geom_point()` layer to the plot and create a scatter plot showing the `weight` on the x- and the `height` on the y-axis
3. Make the `color` aesthetic of the points unique by the `sex` of each individual.
Start Exercise

```ggplot(___) +
geom_point(
mapping = aes(x = ___, y = ___,
color = ___,
size  = ___),
alpha  = ___
)```

For the `gapminder_2007` dataset we can plot the GDP per capita `gdpPercap` vs. the life expectancy as follows:

```ggplot(gapminder_2007) +
geom_point(aes(x = gdpPercap, y = lifeExp))``` To adjust the point size based on the population (`pop`) of each country we can use:

```ggplot(gapminder_2007) +
geom_point(aes(x = gdpPercap, y = lifeExp,
size = pop))``` We see that the point sizes in the plot above do not clearly reflect the population differences in each country. If we compare the point size representing a population of 250 million people with the one displaying 750 million, we can see, that their sizes are not proportional. Instead, the point sizes are binned by default. To reflect the actual population differences by the point size we can use the `scale_size_area()` function instead. The scaling information can be added like any other ggplot object with the `+` operator:

```ggplot(gapminder_2007) +
geom_point(aes(x = gdpPercap, y = lifeExp,
size = pop)) +
scale_size_area(max_size = 10)``` Note that we have adjusted the point’s `max_size` which results in bigger point sizes.

## Exercise: Create a Gapminder scatter plot using size

Create a scatter plot with ggplot2 which shows the relationship between GDP per capita and life expectancy for the year 2007 using the `gapminder_2007` dataset.

1. Use the `ggplot()` function and specify the `gapminder_2007` dataset as input
2. Add a `geom_point()` layer to the plot and create a scatter plot showing the GDP per capita `gdpPercap` on the x-axis and the life expectancy `lifeExp` on the y-axis
3. Use the `size` aesthetic to adjust the point size by the population `pop`
4. Use the `scale_size_area()` function so that the point sizes reflect actual population differences and set the `max_size` of each point to `10`
Start Exercise

## Setting global aesthetics: transparency

```ggplot(___) +
geom_point(
mapping = aes(x = ___, y = ___,
color = ___,
size  = ___),
alpha  = ___
)``` Plotting many points with similar x- and y-coordinates in one graph can produce dense point clouds. Many points in these clouds are over plotted and the true number of observations in a certain area is not visible any more. As a solution, we can set the transparency of each point using the ggplot parameter `alpha`.

Since we do not want to set the point transparency individually for each point but globally for all points we do not set the `alpha` parameter as an aesthetic mapping (within `aes()`) but outside.

We set the opacity of each point to 50% through the parameter `alpha` outside as a constant parameter:

```ggplot(gapminder_2007) +
geom_point(aes(x = gdpPercap, y = lifeExp, size = pop),
alpha = 0.5)``` We can now clearly see how many points are overlapping each other and the opacity of each point is set to `0.5`.

## Quiz: Gapminder Plot

```ggplot(gapminder_2007) +
geom_point(aes(x = gdpPercap, y = lifeExp, size = pop,
alpha = 0.5,
color = "red"))``` Which statements about the plot above are correct?
• Constant plot parameters should be set outside of an aesthetic mapping `aes()`.
• The reason for the legend entries `alpha` and `color` are that they are set as aesthetic mappings instead of global parameters.
• The parameter `lifeExp` should be set as a global parameter.
• The parameter `gdpPercap` should be set as a global parameter.
Start Quiz

## Exercise: Reproduce Gapminder scatter plot

Try to reproduce the following plot: 1. Use the `ggplot()` function and specify the `gapminder_2007` dataset as input
2. Add a `geom_point` layer to the plot and create a scatter plot showing the GDP per capita `gdpPercap` on the x-axis and the life expectancy `lifeExp` on the y-axis
3. Use the `color` aesthetic to indicate each `continent` by a different color
4. Use the `size` aesthetic to adjust the point size by the population `pop`
5. Use `scale_size_area()` so that the point sizes reflect the actual population differences and set the `max_size` of each point to `15`
6. Set the opacity/transparency of each point to 70% using the `alpha` parameter
Start Exercise

Specify additional aesthetics for points is an excerpt from the course Introduction to R, which is available for free at quantargo.com

VIEW FULL COURSE