# Detailed Guide to the Bar Chart in R with ggplot

May 1, 2019
By

[This article was first published on Learn R Programming & Build a Data Science Career | Michael Toth, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

When it comes to data visualization, flashy graphs can be fun. Believe me, Iâ€™m as big a fan of flashy graphs as anybody. But if youâ€™re trying to convey information, especially to a broad audience, flashy isnâ€™t always the way to go.

Whether itâ€™s the line graph, scatter plot, or bar chart (the subject of this guide!), choosing a well-understood and common graph style is usually the way to go for most audiences, most of the time. And if youâ€™re just getting started with your R journey, itâ€™s important to master the basics before complicating things further.

So in this guide, Iâ€™m going to talk about creating a bar chart in R. Specifically, Iâ€™ll show you exactly how you can use the `ggplot` `geom_bar` function to create a bar chart.

A bar chart is a graph that is used to show comparisons across discrete categories. One axisâ€“the x-axis throughout this guideâ€“shows the categories being compared, and the other axisâ€“the y-axis in our caseâ€“represents a measured value. The heights of the bars are proportional to the measured values.

For example, in this extremely scientific bar chart, we see the level of life threatening danger for three different actions. All dangerous, to be sure, but I think we can all agree this graph gets things right in showing that Game of Thrones spoilers are most dangerous of all.

## Introduction to ggplot

Before diving into the `ggplot` code to create a bar chart in R, I first want to briefly explain `ggplot` and why I think itâ€™s the best choice for graphing in R.

`ggplot` is a package for creating graphs in R, but itâ€™s also a method of thinking about and decomposing complex graphs into logical subunits.

`ggplot` takes each component of a graphâ€“axes, scales, colors, objects, etcâ€“and allows you to build graphs up sequentially one component at a time. You can then modify each of those components in a way thatâ€™s both flexible and user-friendly. When components are unspecified, `ggplot` uses sensible defaults. This makes `ggplot` a powerful and flexible tool for creating all kinds of graphs in R. Itâ€™s the tool I use to create nearly every graph I make these days, and I think you should use it too!

## Follow Along With the Workbook

To accompany this guide, Iâ€™ve created a free workbook that you can work through to apply what youâ€™re learning as you read.

The workbook is an R file that contains all the code shown in this post as well as additional guided questions and exercises to help you understand the topic even deeper.

If you want to really learn how to create a bar chart in R so that youâ€™ll still remember weeks or even months from now, you need to practice.

## Investigating our dataset

Throughout this guide, weâ€™ll be using the `mpg` dataset thatâ€™s built into ggplot. This dataset contains data on fuel economy for 38 popular car models. Letâ€™s take a look:

The mpg dataset contains 11 columns:

• `manufacturer`: Car Manufacturer Name
• `model`: Car Model Name
• `displ`: Engine Displacement (liters)
• `year`: Year of Manufacture
• `cyl`: Number of Cylinders
• `trans`: Type of Transmission
• `drv`: f = front-wheel drive, r = rear-wheel drive, 4 = 4wd
• `cty`: City Miles per Gallon
• `hwy`: Highway Miles per Gallon
• `fl`: Fuel Type
• `class`: Type of Car

## How to create a simple bar chart in R using `geom_bar`

`ggplot` uses geoms, or geometric objects, to form the basis of different types of graphs. Previously I have talked about `geom_line` for line graphs and `geom_point` for scatter plots. Today Iâ€™ll be focusing on `geom_bar`, which is used to create bar charts in R.

```library(tidyverse)

ggplot(mpg) +
geom_bar(aes(x = class))
```

Here we are starting with the simplest possible `ggplot` bar chart we can create using `geom_bar`. Letâ€™s review this in more detail:

First, we call `ggplot`, which creates a new `ggplot` graph. Basically, this creates a blank canvas on which weâ€™ll add our data and graphics. Here we pass mpg to `ggplot` to indicate that weâ€™ll be using the mpg data for this particular `ggplot` bar chart.

Next, we add the `geom_bar` call to the base `ggplot` graph in order to create this bar chart. In `ggplot`, you use the `+` symbol to add new layers to an existing graph. In this second layer, I told `ggplot` to use `class` as the x-axis variable for the bar chart.

Youâ€™ll note that we donâ€™t specify a y-axis variable here. Later on, Iâ€™ll tell you how we can modify the y-axis for a bar chart in R. But for now, just know that if you donâ€™t specify anything, `ggplot` will automatically count the occurrences of each x-axis category in the dataset, and will display the `count` on the y-axis.

And thatâ€™s it, we have our bar chart! We see that SUVs are the most prevalent in our data, followed by compact and midsize cars.

## Changing bar color in a `ggplot` bar chart

Expanding on this example, letâ€™s change the colors of our bar chart!

```ggplot(mpg) +
geom_bar(aes(x = class), fill = 'blue')
```

Youâ€™ll note that this `geom_bar` call is identical to the one before, except that weâ€™ve added the modifier `fill = 'blue'` to to end of the line. Experiment a bit with different colors to see how this works on your machine. You can use most color names you can think of, or you can use specific hex colors codes to get more granular.

If youâ€™re familiar with line graphs and scatter plots in ggplot, youâ€™ve seen that in those cases we changed the color by specifing `color = 'blue'`, while in this case weâ€™re using `fill = 'blue'`.

In ggplot, `color` is used to change the outline of an object, while `fill` is used to fill the inside of an object. For objects like points and lines, there is no inside to fill, so we use `color` to change the color of those objects. With bar charts, the bars can be filled, so we use `fill` to change the color with `geom_bar`.

This distinction between `color` and `fill` gets a bit more complex, so stick with me to hear more about how these work with bar charts in ggplot!

## Mapping bar color to a variable in a `ggplot` bar chart

Now, letâ€™s try something a little different. Compare the `ggplot` code below to the code we just executed above. There are 2 differences. See if you can find them and guess what will happen, then scroll down to take a look at the result. If youâ€™ve read my previous `ggplot` guides, this bit should look familiar!

```ggplot(mpg) +
geom_bar(aes(x = class, fill = drv))
```

This graph shows the same data as before, but now instead of showing solid-colored bars, we now see that the bars are stacked with 3 different colors! The red portion corresponds to 4-wheel drive cars, the green to front-wheel drive cars, and the blue to rear-wheel drive cars. Did you catch the 2 changes we used to change the graph? They were:

1. Instead of specifying `fill = 'blue'`, we specified `fill = drv`
2. We moved the fill parameter inside of the `aes()` parentheses

Before, we told `ggplot` to change the color of the bars to blue by adding `fill = 'blue'` to our `geom_bar()` call.

What weâ€™re doing here is a bit more complex. Instead of specifying a single color for our bars, weâ€™re telling `ggplot` to map the data in the `drv` column to the `fill` aesthetic.

This means we are telling `ggplot` to use a different color for each value of `drv` in our data! This mapping also lets `ggplot` know that it also needs to create a legend to identify the drive types, and it places it there automatically!

### More Details on Stacked Bar Charts in `ggplot`

As we saw above, when we map a variable to the `fill` aesthetic in `ggplot`, it creates whatâ€™s called a stacked bar chart. A stacked bar chart is a variation on the typical bar chart where a bar is divided among a number of different segments.

In this case, weâ€™re dividing the bar chart into segments based on the levels of the `drv` variable, corresponding to the front-wheel, rear-wheel, and four-wheel drive cars.

For a given `class` of car, our stacked bar chart makes it easy to see how many of those cars fall into each of the 3 `drv` categories.

The main flaw of stacked bar charts is that they become harder to read the more segments each bar has, especially when trying to make comparisons across the x-axis (in our case, across car `class`). To illustrate, letâ€™s take a look at this next example:

```# Note we convert the cyl variable to a factor to fill properly
ggplot(mpg) +
geom_bar(aes(x = class, fill = factor(cyl)))
```

As you can see, even with four segments it starts to become difficult to make comparisons between the different categories on the x-axis. For example, are there more 6-cylinder minivans or 6-cylinder pickups in our dataset? What about 5-cylinder compacts vs. 5-cylinder subcompacts? With stacked bars, these types of comparisons become challenging. My recommendation is to generally avoid stacked bar charts with more than 3 segments.

### Dodged Bars in ggplot

Instead of stacked bars, we can use side-by-side (dodged) bar charts. In ggplot, this is accomplished by using the `position = position_dodge()` argument as follows:

```# Note we convert the cyl variable to a factor here in order to fill by cylinder
ggplot(mpg) +
geom_bar(aes(x = class, fill = factor(cyl)), position = position_dodge(preserve = 'single'))
```

Now, the different segments for each class are placed side-by-side instead of stacked on top of each other.

Revisiting the comparisons from before, we can quickly see that there are an equal number of 6-cylinder minivans and 6-cylinder pickups. There are also an equal number of 5-cylinder compacts and subcompacts.

While these comparisons are easier with a dodged bar graph, comparing the total count of cars in each class is far more difficult.

Which brings us to a general point: different graphs serve different purposes! You shouldnâ€™t try to accomplish too much in a single graph. If youâ€™re trying to cram too much information into a single graph, youâ€™ll likely confuse your audience, and theyâ€™ll take away exactly none of the information.

## Scaling bar size to a variable in your data

Up to now, all of the bar charts weâ€™ve reviewed have scaled the height of the bars based on the count of a variable in the dataset. First we counted the number of vehicles in each `class`, and then we counted the number of vehicles in each `class` with each `drv` type.

What if we donâ€™t want the height of our bars to be based on count? What if we already have a column in our dataset that we want to be used as the y-axis height? Letâ€™s say we wanted to graph the average highway miles per gallon by `class` of car, for example. How can we do that in ggplot?
There are two ways we can do this, and Iâ€™ll be reviewing them both. To start, Iâ€™ll introduce `stat = 'identity'`:

```# Use dplyr to calculate the average hwy_mpg by class
by_hwy_mpg <- mpg %>% group_by(class) %>% summarise(hwy_mpg = mean(hwy))

ggplot(by_hwy_mpg) +
geom_bar(aes(x = class, y = hwy_mpg), stat = 'identity')
```

Now we see a graph by `class` of car where the y-axis represents the average highway miles per gallon of each `class.` How does this work, and how is it different from what we had before?

Before, we did not specify a y-axis variable and instead let `ggplot` automatically populate the y-axis with a count of our data. Now, weâ€™re explicityly telling `ggplot` to use `hwy_mpg` as our y-axis variable. And thereâ€™s something else here also: `stat = 'identity'`. What does that mean?

We saw earlier that if we omit the y-variable, `ggplot` will automatically scale the heights of the bars to a count of cases in each group on the x-axis. If we instead want the values to come from a column in our data frame, we need to change two things in our `geom_bar` call:

1. Add `stat = 'identity'` to `geom_bar()`

Adding a y-variable mapping alone without adding `stat='identity'` leads to an error message:

Why the error? If you donâ€™t specify `stat = 'identity'`, then under the hood, `ggplot` is automatically passing a default value of `stat = 'count'`, which graphs the counts by group. A y-variable is not compatible with this, so you get the error message.

If this is confusing, thatâ€™s okay. For now, all you need to remember is that if you want to use `geom_bar` to map the heights of a column in your dataset, you need to add BOTH a y-variable mapping AND `stat = 'identity'`.

Iâ€™ll be honest, this was highly confusing for me for a long time. I hope this guidance helps to clear things up for you, so you donâ€™t have to suffer the same confusion that I did. But if you have a hard time remembering this distinction, `ggplot` also has a handy function that does this work for you. Instead of using `geom_bar` with `stat = 'identity'`, you can simply use the `geom_col` function to get the same result. Letâ€™s see:

```# Use dplyr to calculate the average hwy_mpg by class
by_hwy_mpg <- mpg %>% group_by(class) %>% summarise(hwy_mpg = mean(hwy))

ggplot(by_hwy_mpg) +
geom_col(aes(x = class, y = hwy_mpg))
```

Youâ€™ll notice the result is the same as the graph we made above, but weâ€™ve replaced `geom_bar` with `geom_col` and removed `stat = 'identity'`. `geom_col` is the same as `geom_bar` with `stat = 'identity'`, so you can use whichever you prefer or find easier to understand. For me, Iâ€™ve gotten used to `geom_bar`, so I prefer to use that, but you can do whichever you like!

## Revisiting `color` in `geom_bar`

Above, we showed how you could change the color of bars in `ggplot` using the `fill` option. I mentioned that `color` is used for line graphs and scatter plots, but that we use `fill` for bars because we are filling the inside of the bar with color. That said, `color` does still work here, though it affects only the outline of the graph in question. Take a look:

```ggplot(mpg) +
geom_bar(aes(x = class), color = 'blue')
```

This created graphs with bars filled with the standard gray, but outlined in blue. That outline is what `color` affects for bar charts in ggplot!

I personally only use `color` for one specific thing: modifying the outline of a bar chart where Iâ€™m already using `fill` to create a better looking graph with a little extra pop. The standard `fill` is fine for most purposes, but you can step things up a bit with a carefully selected `color` outline:

```ggplot(mpg) +
geom_bar(aes(x = class), fill = '#003366', color = '#add8e6')
```

Itâ€™s subtle, but this graph uses a darker navy blue for the fill of the bars and a lighter blue for the outline that makes the bars pop a little bit.

This is the only time when I use `color` for bar charts in R. Do you have a use case for this? Iâ€™d love to hear it, so let me know in the comments!

## A deeper review of `aes()` (aesthetic) mappings in ggplot

We saw above how we can create graphs in `ggplot` that use the `fill` argument map the `cyl` variable or the `drv` variable to the color of bars in a bar chart. `ggplot` refers to these mappings as aesthetic mappings, and they include everything you see within the `aes()` in `ggplot`.

Aesthetic mappings are a way of mapping variables in your data to particular visual properties (aesthetics) of a graph.

I know this can sound a bit theoretical, so letâ€™s review the specific aesthetic mappings youâ€™ve already seen as well as the other mappings available within geom_bar.

### Reviewing the list of geom_bar aesthetic mappings

The main aesthetic mappings for a ggplot bar graph include:

• `x`: Map a variable to a position on the x-axis
• `y`: Map a variable to a position on the y-axis
• `fill`: Map a variable to a bar color
• `color`: Map a variable to a bar outline color
• `linetype`: Map a variable to a bar outline linetype
• `alpha`: Map a variable to a bar transparency

From the list above, weâ€™ve already seen the `x` and `fill` aesthetic mappings. Weâ€™ve also seen `color` applied as a parameter to change the outline of the bars in the prior example.

Iâ€™m not going to review the additional aesthetics in this post, but if youâ€™d like more details, check out the free workbook which includes some examples of these aesthetics in more detail!

## Aesthetic mappings vs. parameters in ggplot

I often hear from my R training clients that they are confused by the distinction between aesthetic mappings and parameters in ggplot. Personally, I was quite confused by this when I was first learning about graphing in ggplot as well. Let me try to clear up some of the confusion!

Above, we saw that we could use `fill` in two different ways with `geom_bar`. First, we were able to set the color of our bars to blue by specifying `fill = 'blue'` outside of our `aes()` mappings. Then, we were able to map the variable `drv` to the color of our bars by specifying `fill = drv` inside of our `aes()` mappings.

What is the difference between these two ways of working with `fill` and other aesthetic mappings?

When you include `fill`, `color`, or another aesthetic inside the `aes()` of your `ggplot` code, youâ€™re telling `ggplot` to map a variable to that aesthetic in your graph. This is what we did when we said `fill = drv` above to fill different drive types with different colors.

Each of the aesthetic mappings youâ€™ve seen can also be used as a parameter, that is, a fixed value defined outside of the `aes()` aesthetic mappings. You saw how to do this with `fill` when we made the bar chart bars blue with `fill = 'blue'`. You also saw how we could outline the bars with a specific color when we used `color = '#add8e6'`.

Whenever youâ€™re trying to map a variable in your data to an aesthetic to your graph, you want to specify that inside the `aes()` function. And whenever youâ€™re trying to hardcode a specific parameter in your graph (making the bars blue, for example), you want to specify that outside the `aes()` function. I hope this helps to clear up any confusion you have on the distinction between aesthetic mappings and parameters!

## Common errors with aesthetic mappings and parameters in ggplot

When I was first learning R and ggplot, this difference between aesthetic mappings (the values included inside your `aes()`), and parameters (the ones outside your `aes()`) was constantly confusing me. Luckily, over time, youâ€™ll find that this becomes second nature. But in the meantime, I can help you speed along this process with a few common errors that you can keep an eye out for.

##### Trying to include aesthetic mappings outside your `aes()` call

If youâ€™re trying to map the `drv` variable to `fill`, you should include `fill = drv` within the `aes()` of your `geom_bar` call. What happens if you include it outside accidentally, and instead run `ggplot(mpg) + geom_bar(aes(x = class), fill = drv)`? Youâ€™ll get an error message that looks like this:

Whenever you see this error about object not found, be sure to check that youâ€™re including your aesthetic mappings inside the `aes()` call!

##### Trying to specify parameters inside your `aes()` call

On the other hand, if we try including a specific parameter value (for example, `fill = 'blue'`) inside of the `aes()` mapping, the error is a bit less obvious. Take a look:

```ggplot(mpg) +
geom_bar(aes(x = class, fill = 'blue'))
```

In this case, `ggplot` actually does produce a bar chart, but itâ€™s not what we intended.

For starters, the bars in our bar chart are all red instead of the blue we were hoping for! Also, thereâ€™s a legend to the side of our bar graph that simply says â€˜blueâ€™.

Whatâ€™s going on here? Under the hood, `ggplot` has taken the string â€˜blueâ€™ and created a new hidden column of data where every value simple says â€˜blueâ€™. Then, itâ€™s mapped that column to the `fill` aesthetic, like we saw before when we specified `fill = drv`. This results in the legend label and the color of all the bars being set, not to blue, but to the default color in `ggplot`.

If this is confusing, thatâ€™s okay for now. Just remember: when you run into issues like this, double check to make sure youâ€™re including the parameters of your graph outside your `aes()` call!

You should now have a solid understanding of how to create a bar chart in R using the `ggplot` bar chart function, `geom_bar`!

Iâ€™ve found that working through code on my own is the best way for me to learn new topics so that Iâ€™ll actually remember them when I need to do things on my own in the future.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.