Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
ggplot2 implements the grammar of graphics to map attributes from a data set to plot features through aesthetics. This framework can be used to adjust the point size, color and transparency alpha of points in a scatter plot.
- Add additional plotting dimensions through aesthetics
- Adjust the point size of a scatter plot using the
sizeparameter - Change the point color of a scatter plot using the
colorparameter - Set a parameter
alphato change the transparency of all points - Differentiate between aesthetic mappings and constant parameters
ggplot(___) +
geom_point(
mapping = aes(x = ___, y = ___,
color = ___,
size = ___),
alpha = ___
)
Adding more plot aesthetics
In their most basic form scatter plots can only visualize datasets in two dimensions through the x and y aesthetics of the geom_point() layer. However, most data sets have more than two variables and thus might require additional plotting dimensions. ggplot() makes it very easy to map additional variables to different plotting aesthetics like size, transparency alpha and color.
Let’s consider the gapminder_2007 dataset which contains the variables GDP per capita gdpPercap and life expectancy lifeExp for 142 countries in the year 2007:
ggplot(gapminder_2007) + geom_point(aes(x = gdpPercap, y = lifeExp))
Mapping the continent variable through the point color aesthetic and the population pop (in millions) through the point size we obtain a much richer plot including 4 different variables from the data set:
Quiz: geom_point() Aesthetics
Which aesthetics can be specified forgeom_point()?
geom_linecolorpointalphasize
Adjusting point color
ggplot(___) +
geom_point(
mapping = aes(x = ___, y = ___,
color = ___,
size = ___),
alpha = ___
)
Typically, the point color is used to introduce a new dimension to a scatter plot. In ggplot we use the color aesthetic to specify the mapping of a variable to the color of the points.
For the gapminder_2007 dataset we can plot the GDP per capita gdpPercap vs. the life expectancy lifeExp as follows:
ggplot(gapminder_2007) + geom_point(aes(x = gdpPercap, y = lifeExp))
To color each point based on the continent of each country we can use:
ggplot(gapminder_2007) +
geom_point(aes(x = gdpPercap, y = lifeExp,
color = continent))
We see that in the resulting plot each point is colored differently based on the continent of each country. ggplot uses the coloring scheme based on the categorical data type of the variable continent.
By contrast, let’s see how the plot looks like if we color the points by the numeric variable population pop:
ggplot(gapminder_2007) +
geom_point(aes(x = gdpPercap, y = lifeExp,
color = pop))
The scale immediately changes to continuous as it can be seen in the legend and the light-blue points are now the countries with the highest population number (China and India).
Exercise: Reconstruct Gapminder graph
Reconstruct the following graph which shows the relationship between GDP per capita and life expectancy for the year 2007:
- Use the
ggplot()function and specify thegapminder_2007dataset as input - Add a
geom_pointlayer to the plot and create a scatter plot showing the GDP per capitagdpPercapon the x-axis and the life expectancylifeExpon the y-axis - Make the
coloraesthetic of the points unique for eachcontinent
Exercise: Create a colored scatter plot with DavisClean
The DavisClean dataset contains the height and weight measurements of 199 people.
- Use the
ggplot()function and specify theDavisCleandataset as input - Add a
geom_point()layer to the plot and create a scatter plot showing theweighton the x- and theheighton the y-axis - Make the
coloraesthetic of the points unique by thesexof each individual.
Adjusting point size
ggplot(___) +
geom_point(
mapping = aes(x = ___, y = ___,
color = ___,
size = ___),
alpha = ___
)
For the gapminder_2007 dataset we can plot the GDP per capita gdpPercap vs. the life expectancy as follows:
ggplot(gapminder_2007) + geom_point(aes(x = gdpPercap, y = lifeExp))
To adjust the point size based on the population (pop) of each country we can use:
ggplot(gapminder_2007) +
geom_point(aes(x = gdpPercap, y = lifeExp,
size = pop))
We see that the point sizes in the plot above do not clearly reflect the population differences in each country. If we compare the point size representing a population of 250 million people with the one displaying 750 million, we can see, that their sizes are not proportional. Instead, the point sizes are binned by default. To reflect the actual population differences by the point size we can use the scale_size_area() function instead. The scaling information can be added like any other ggplot object with the + operator:
ggplot(gapminder_2007) +
geom_point(aes(x = gdpPercap, y = lifeExp,
size = pop)) +
scale_size_area(max_size = 10)
Note that we have adjusted the point’s max_size which results in bigger point sizes.
Exercise: Create a Gapminder scatter plot using size
Create a scatter plot with ggplot2 which shows the relationship between GDP per capita and life expectancy for the year 2007 using the gapminder_2007 dataset.
- Use the
ggplot()function and specify thegapminder_2007dataset as input - Add a
geom_point()layer to the plot and create a scatter plot showing the GDP per capitagdpPercapon the x-axis and the life expectancylifeExpon the y-axis - Use the
sizeaesthetic to adjust the point size by the populationpop - Use the
scale_size_area()function so that the point sizes reflect actual population differences and set themax_sizeof each point to10
Setting global aesthetics: transparency
ggplot(___) +
geom_point(
mapping = aes(x = ___, y = ___,
color = ___,
size = ___),
alpha = ___
)
Plotting many points with similar x- and y-coordinates in one graph can produce dense point clouds. Many points in these clouds are over plotted and the true number of observations in a certain area is not visible any more. As a solution, we can set the transparency of each point using the ggplot parameter alpha.
Since we do not want to set the point transparency individually for each point but globally for all points we do not set the alpha parameter as an aesthetic mapping (within aes()) but outside.
We set the opacity of each point to 50% through the parameter alpha outside as a constant parameter:
ggplot(gapminder_2007) +
geom_point(aes(x = gdpPercap, y = lifeExp, size = pop),
alpha = 0.5)
We can now clearly see how many points are overlapping each other and the opacity of each point is set to 0.5.
Quiz: Gapminder Plot
ggplot(gapminder_2007) +
geom_point(aes(x = gdpPercap, y = lifeExp, size = pop,
alpha = 0.5,
color = "red"))
- Constant plot parameters should be set outside of an aesthetic mapping
aes(). - The reason for the legend entries
alphaandcolorare that they are set as aesthetic mappings instead of global parameters. - The parameter
lifeExpshould be set as a global parameter. - The parameter
gdpPercapshould be set as a global parameter.
Exercise: Reproduce Gapminder scatter plot
Try to reproduce the following plot:
- Use the
ggplot()function and specify thegapminder_2007dataset as input - Add a
geom_pointlayer to the plot and create a scatter plot showing the GDP per capitagdpPercapon the x-axis and the life expectancylifeExpon the y-axis - Use the
coloraesthetic to indicate eachcontinentby a different color - Use the
sizeaesthetic to adjust the point size by the populationpop - Use
scale_size_area()so that the point sizes reflect the actual population differences and set themax_sizeof each point to15 - Set the opacity/transparency of each point to 70% using the
alphaparameter
Specify additional aesthetics for points is an excerpt from the course Introduction to R, which is available for free at quantargo.com
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
