Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

ggplot2 implements the grammar of graphics to map attributes from a data set to plot features through aesthetics. This framework can be used to adjust the point size, color and transparency alpha of points in a scatter plot.

• Adjust the point size of a scatter plot using the size parameter
• Change the point color of a scatter plot using the color parameter
• Set a parameter alpha to change the transparency of all points
• Differentiate between aesthetic mappings and constant parameters
ggplot(___) +
geom_point(
mapping = aes(x = ___, y = ___,
color = ___,
size  = ___),
alpha  = ___
)

In their most basic form scatter plots can only visualize datasets in two dimensions through the x and y aesthetics of the geom_point() layer. However, most data sets have more than two variables and thus might require additional plotting dimensions. ggplot() makes it very easy to map additional variables to different plotting aesthetics like size, transparency alpha and color.

Let’s consider the gapminder_2007 dataset which contains the variables GDP per capita gdpPercap and life expectancy lifeExp for 142 countries in the year 2007:

ggplot(gapminder_2007) +
geom_point(aes(x = gdpPercap, y = lifeExp))

Mapping the continent variable through the point color aesthetic and the population pop (in millions) through the point size we obtain a much richer plot including 4 different variables from the data set:

## Quiz: geom_point() Aesthetics

Which aesthetics can be specified for geom_point()?
• geom_line
• color
• point
• alpha
• size
Start Quiz

ggplot(___) +
geom_point(
mapping = aes(x = ___, y = ___,
color = ___,
size  = ___),
alpha  = ___
)

Typically, the point color is used to introduce a new dimension to a scatter plot. In ggplot we use the color aesthetic to specify the mapping of a variable to the color of the points.

For the gapminder_2007 dataset we can plot the GDP per capita gdpPercap vs. the life expectancy lifeExp as follows:

ggplot(gapminder_2007) +
geom_point(aes(x = gdpPercap, y = lifeExp))

To color each point based on the continent of each country we can use:

ggplot(gapminder_2007) +
geom_point(aes(x = gdpPercap, y = lifeExp,
color = continent))

We see that in the resulting plot each point is colored differently based on the continent of each country. ggplot uses the coloring scheme based on the categorical data type of the variable continent.

By contrast, let’s see how the plot looks like if we color the points by the numeric variable population pop:

ggplot(gapminder_2007) +
geom_point(aes(x = gdpPercap, y = lifeExp,
color = pop))

The scale immediately changes to continuous as it can be seen in the legend and the light-blue points are now the countries with the highest population number (China and India).

## Exercise: Reconstruct Gapminder graph

Reconstruct the following graph which shows the relationship between GDP per capita and life expectancy for the year 2007:

1. Use the ggplot() function and specify the gapminder_2007 dataset as input
2. Add a geom_point layer to the plot and create a scatter plot showing the GDP per capita gdpPercap on the x-axis and the life expectancy lifeExp on the y-axis
3. Make the color aesthetic of the points unique for each continent
Start Exercise

## Exercise: Create a colored scatter plot with DavisClean

The DavisClean dataset contains the height and weight measurements of 199 people.

1. Use the ggplot() function and specify the DavisClean dataset as input
2. Add a geom_point() layer to the plot and create a scatter plot showing the weight on the x- and the height on the y-axis
3. Make the color aesthetic of the points unique by the sex of each individual.
Start Exercise

ggplot(___) +
geom_point(
mapping = aes(x = ___, y = ___,
color = ___,
size  = ___),
alpha  = ___
)

For the gapminder_2007 dataset we can plot the GDP per capita gdpPercap vs. the life expectancy as follows:

ggplot(gapminder_2007) +
geom_point(aes(x = gdpPercap, y = lifeExp))

To adjust the point size based on the population (pop) of each country we can use:

ggplot(gapminder_2007) +
geom_point(aes(x = gdpPercap, y = lifeExp,
size = pop))

We see that the point sizes in the plot above do not clearly reflect the population differences in each country. If we compare the point size representing a population of 250 million people with the one displaying 750 million, we can see, that their sizes are not proportional. Instead, the point sizes are binned by default. To reflect the actual population differences by the point size we can use the scale_size_area() function instead. The scaling information can be added like any other ggplot object with the + operator:

ggplot(gapminder_2007) +
geom_point(aes(x = gdpPercap, y = lifeExp,
size = pop)) +
scale_size_area(max_size = 10)

Note that we have adjusted the point’s max_size which results in bigger point sizes.

## Exercise: Create a Gapminder scatter plot using size

Create a scatter plot with ggplot2 which shows the relationship between GDP per capita and life expectancy for the year 2007 using the gapminder_2007 dataset.

1. Use the ggplot() function and specify the gapminder_2007 dataset as input
2. Add a geom_point() layer to the plot and create a scatter plot showing the GDP per capita gdpPercap on the x-axis and the life expectancy lifeExp on the y-axis
3. Use the size aesthetic to adjust the point size by the population pop
4. Use the scale_size_area() function so that the point sizes reflect actual population differences and set the max_size of each point to 10
Start Exercise

## Setting global aesthetics: transparency

ggplot(___) +
geom_point(
mapping = aes(x = ___, y = ___,
color = ___,
size  = ___),
alpha  = ___
)

Plotting many points with similar x- and y-coordinates in one graph can produce dense point clouds. Many points in these clouds are over plotted and the true number of observations in a certain area is not visible any more. As a solution, we can set the transparency of each point using the ggplot parameter alpha.

Since we do not want to set the point transparency individually for each point but globally for all points we do not set the alpha parameter as an aesthetic mapping (within aes()) but outside.

We set the opacity of each point to 50% through the parameter alpha outside as a constant parameter:

ggplot(gapminder_2007) +
geom_point(aes(x = gdpPercap, y = lifeExp, size = pop),
alpha = 0.5)

We can now clearly see how many points are overlapping each other and the opacity of each point is set to 0.5.

## Quiz: Gapminder Plot

ggplot(gapminder_2007) +
geom_point(aes(x = gdpPercap, y = lifeExp, size = pop,
alpha = 0.5,
color = "red"))
Which statements about the plot above are correct?
• Constant plot parameters should be set outside of an aesthetic mapping aes().
• The reason for the legend entries alpha and color are that they are set as aesthetic mappings instead of global parameters.
• The parameter lifeExp should be set as a global parameter.
• The parameter gdpPercap should be set as a global parameter.
Start Quiz

## Exercise: Reproduce Gapminder scatter plot

Try to reproduce the following plot:

1. Use the ggplot() function and specify the gapminder_2007 dataset as input
2. Add a geom_point layer to the plot and create a scatter plot showing the GDP per capita gdpPercap on the x-axis and the life expectancy lifeExp on the y-axis
3. Use the color aesthetic to indicate each continent by a different color
4. Use the size aesthetic to adjust the point size by the population pop
5. Use scale_size_area() so that the point sizes reflect the actual population differences and set the max_size of each point to 15
6. Set the opacity/transparency of each point to 70% using the alpha parameter
Start Exercise

Specify additional aesthetics for points is an excerpt from the course Introduction to R, which is available for free at quantargo.com

VIEW FULL COURSE