Site icon R-bloggers

Create your first bar chart

[This article was first published on Quantargo Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
ggplot(___) + 
  geom_col(
    mapping = aes(x = ___, y = ___, 
                  fill = ___)
 )

Introduction to bar charts

Bar charts visualize numeric values grouped by categories. Each category is represented by one bar with a height defined by each numeric value.

Bar charts are well suited to compare values among different groups e.g. number of votes by parties, number of people in different countries or GDP per capita in different countries. Bar charts are a bit spacious and work best if the number of groups to compare is rather small.

Below you can find an example showing the number of people (in millions) in the five biggest countries by population in 2007:

Creating a simple bar chart

ggplot(___) + 
  geom_col(
    mapping = aes(x = ___, y = ___, 
                  fill = ___)
 )

In ggplot2, bar charts are created using the geom_col() geometric layer. The geom_col() layer requires the x aesthetic mapping which defines the different bars to be plotted. The height of each bar is defined by the variable specified in the y aesthetic mapping. Both mappings, x and y are required for geom_col().

Let’s create our first bar chart with the gapminder_top5 dataset. It contains population (in millions) and life expectancy data for the biggest countries by population in 2007.

ggplot(gapminder_top5) + 
  geom_col(aes(x = country, y = pop))

We see that the resulting bars are sorted by the country names in alphabetical order by default.

Exercise: Plot life expectancy by country

Create a bar chart showing the life expectancy of the five biggest countries by population in 2007.

  1. Use the ggplot() function and specify the gapminder_top5 dataset as input
  2. Add a geom_col() layer to the plot
  3. Plot one bar for each country (x aesthetic)
  4. Use life expectancy lifeExp as bar height (y aesthetic)
Start Exercise

Filling bars with color

ggplot(___) + 
  geom_col(
    mapping = aes(x = ___, y = ___, 
                  fill = ___)
 )

Like other geoms geom_col() allows users to map additional dataset variables to the color attribute of the bar. The fill aesthetic can be used to fill the entire bars with color. A usual confusion is the color aesthetic which specifies the line color of each bar’s border instead of the fill color.

Based on the gapminder_top5 dataset we plot the population (in millions) of the biggest countries and use the continent variable to color each bar:

ggplot(gapminder_top5) + 
  geom_col(aes(x = country, y = pop, fill = continent))

Since the continent variable is a categorical variable the bars have a clear color scheme for each continent. Let’s see what happens if we use a numeric variable like life expectancy lifeExp instead:

ggplot(gapminder_top5) + 
  geom_col(aes(x = country, y = pop, fill = lifeExp))

The bar colors have now changed according the continuous legend on the right. We see that also numeric variables can be used to fill bars.

Exercise: Plot population size by country

Create a bar chart showing the population (in millions) of the five biggest countries by population in 2007.

  1. Use the ggplot() function and specify the gapminder_top5 dataset as input
  2. Add a geom_col() layer to the plot
  3. Plot one bar for each country (x aesthetic)
  4. Use population pop as bar height (y aesthetic)
  5. Use the GDP per capita gdpPercap as fill aesthetic
Start Exercise

Stacked bar charts

ggplot(___) + 
  geom_col(
    mapping = aes(x = ___, y = ___, 
                  fill = ___)
 )

In some circumstances it might be useful to plot multiple numeric values variables within each bar. Examples are numeric values describing one specific entity (e.g. customers) split among various categories (customer segments) so that the bar height represents the total number (all customers).

The plot below shows the number of phones (in thousands) by continent from 1956 to 1961 as a stacked bar chart:

ggplot(world_phones) + 
  geom_col(aes(x = year, y = phones,
               fill = region))

Exercise: Plot number of crimes by US states

Create a bar chart showing the number of crimes by US state per 100,000 residents in 1973.

  1. Use the ggplot() function and specify the us_arrests dataset as input
  2. Add a geom_col() layer to the plot
  3. Plot one bar for each state (x aesthetic)
  4. Use the number of cases as bar height (y aesthetic)
  5. Use the crime type as fill aesthetic.
Start Exercise

Create your first bar chart is an excerpt from the course Introduction to R, which is available for free at quantargo.com

VIEW FULL COURSE

To leave a comment for the author, please follow the link and comment on their blog: Quantargo Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.