# A Detailed Guide to Plotting Line Graphs in R using ggplot geom_line

April 16, 2019
By

[This article was first published on Learn R Programming & Build a Data Science Career | Michael Toth, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

When it comes to data visualization, it can be fun to think of all the flashy and exciting ways to display a dataset. But if youâ€™re trying to convey information, flashy isnâ€™t always the way to go.

In fact, one of the most powerful ways to communicate the relationship between two variables is the simple line graph.

A line graph is a type of graph that displays information as a series of data points connected by straight line segments.

##### Line graph of average monthly temperatures for four major cities

There are many different ways to use R to plot line graphs, but the one I prefer is the `ggplot geom_line` function.

## Introduction to ggplot

Before we dig into creating line graphs with the `ggplot geom_line` function, I want to briefly touch on `ggplot` and why I think itâ€™s the best choice for plotting graphs in R.

`ggplot` is a package for creating graphs in R, but itâ€™s also a method of thinking about and decomposing complex graphs into logical subunits.

`ggplot` takes each component of a graphâ€“axes, scales, colors, objects, etcâ€“and allows you to build graphs up sequentially one component at a time. You can then modify each of those components in a way thatâ€™s both flexible and user-friendly. When components are unspecified, `ggplot` uses sensible defaults. This makes `ggplot` a powerful and flexible tool for creating all kinds of graphs in R. Itâ€™s the tool I use to create nearly every graph I make these days, and I think you should use it too!

## Investigating our dataset

Throughout this post, weâ€™ll be using the Orange dataset thatâ€™s built into R. This dataset contains information on the age and circumference of 5 different orange trees, letting us see how these trees grow over time. Letâ€™s take a look at this dataset to see what it looks like:

The dataset contains 3 columns: Tree, age, and cimcumference. There are 7 observations for each Tree, and there are 5 Trees, for a total of 35 observations in all.

## Simple example of ggplot + geom_line()

```library(tidyverse)

# Filter the data we need
tree_1 <- filter(Orange, Tree == 1)

# Graph the data
ggplot(tree_1) +
geom_line(aes(x = age, y = circumference))
```

Here we are starting with the simplest possible line graph using geom_line. For this simple graph, I chose to only graph the size of the first tree. I used `dplyr` to filter the dataset to only that first tree.
If youâ€™re not familiar with `dplyr`â€˜s `filter` function, itâ€™s my preferred way of subsetting a dataset in R, and I recently wrote an in-depth guide to dplyr filter if youâ€™d like to learn more!

Once I had filtered out the dataset I was interested in, I then used `ggplot + geom_line()` to create the graph. Letâ€™s review this in more detail:

First, I call `ggplot`, which creates a new `ggplot` graph. Itâ€™s essentially a blank canvas on which weâ€™ll add our data and graphics. In this case, I passed tree_1 to `ggplot`, indicating that weâ€™ll be using the tree_1 data for this particular `ggplot` graph.

Next, I added my `geom_line` call to the base `ggplot` graph in order to create this line. In `ggplot`, you use the `+` symbol to add new layers to an existing graph. In this second layer, I told `ggplot` to use age as the x-axis variable and circumference as the y-axis variable.

And thatâ€™s it, we have our line graph!

## Changing line color in `ggplot + geom_line`

Expanding on this example, letâ€™s now experiment a bit with colors.

```# Filter the data we need
tree_1 <- filter(Orange, Tree == 1)

# Graph the data
ggplot(tree_1) +
geom_line(aes(x = age, y = circumference), color = 'red')
```

Youâ€™ll note that this geom_line call is identical to the one before, except that weâ€™ve added the modifier `color = 'red'` to to end of the line. Experiment a bit with different colors to see how this works on your machine. You can use most color names you can think of, or you can use specific hex colors codes to get more granular.

Now, letâ€™s try something a little different. Compare the `ggplot` code below to the code we just executed above. There are 3 differences. See if you can find them and guess what will happen, then scroll down to take a look at the result.

```# Graph different data
ggplot(Orange) +
geom_line(aes(x = age, y = circumference, color = Tree))
```

This line graph is quite different from the one we produced above, but we only made a few minor modifications to the code! Did you catch the 3 changes? They were:

1. The dataset changed from tree_1 (our filtered dataset) to the complete Orange dataset
2. Instead of specifying `color = 'red'`, we specified `color = Tree`
3. We moved the color parameter inside of the `aes()` parentheses

Letâ€™s review each of these changes:

##### Moving from tree_1 to Orange

This change is relatively straightforward. Instead of only graphing the data for a single tree, we wanted to graph the data for all 5 trees. We accomplish this by changing our input dataset in the `ggplot()` call.

##### Specifying `color = Tree` and moving it within the `aes()` parentheses

Iâ€™m combining these because these two changes work together.

Before, we told `ggplot` to change the color of the line to red by adding `color = 'red'` to our `geom_line()` call.

What weâ€™re doing here is a bit more complex. Instead of specifying a single color for our line, weâ€™re telling `ggplot` to map the data in the `Tree` column to the `color` aesthetic.

Effectively, weâ€™re telling `ggplot` to use a different color for each tree in our data! This mapping also lets `ggplot` know that it also needs to create a legend to identify the trees, and it places it there automatically!

## Changing linetype in `ggplot + geom_line`

Letâ€™s look at a related example. This time, instead of changing the color of the line graph, we will change the linetype:

```ggplot(Orange) +
geom_line(aes(x = age, y = circumference, linetype = Tree))
```

This `ggplot + geom_line()` call is identical to the one we just reviewed, except weâ€™ve substituted `linetype` for `color`. The graph produced is quite similar, but it uses different linetypes instead of different colors in the graph. You might consider using something like this when printing in black and white, for example.

## A deeper review of `aes()` (aesthetic) mappings in ggplot

We just saw how we can create graphs in `ggplot` that map the Tree variable to color or linetype in a line graph. `ggplot` refers to these mappings as aesthetic mappings, and they encompass everything you see within the `aes()` in ggplot.

Aesthetic mappings are a way of mapping variables in your data to particular visual properties (aesthetics) of a graph.

This might all sound a bit theoretical, so letâ€™s review the specific aesthetic mappings youâ€™ve already seen as well as the other mappings available within geom_line.

##### Reviewing the list of geom_line aesthetic mappings

The main aesthetic mappings for `ggplot + geom_line()` include:

• `x`: Map a variable to a position on the x-axis
• `y`: Map a variable to a position on the y-axis
• `color`: Map a variable to a line color
• `linetype`: Map a variable to a linetype
• `group`: Map a variable to a group (each variable on a separate line)
• `size`: Map a variable to a line size
• `alpha`: Map a variable to a line transparency

From the list above, weâ€™ve already seen the `x`, `y`, `color`, and `linetype` aesthetic mappings.

`x` and `y` are what we used in our first `ggplot + geom_line()` function call to map the variables age and circumference to x-axis and y-axis values. Then, we experimented with using `color` and `linetype` to map the Tree variable to different colored lines or linetypes.

In addition to those, there are 3 other main aesthetic mappings often used with `geom_line`.

The `group` mapping allows us to map a variable to different groups. Within `geom_line`, that means mapping a variable to different lines. Think of it as a pared down version of the `color` and `linetype` aesthetic mappings you already saw. While the `color` aesthetic mapped each Tree to a different line with a different color, the `group` aesthetic maps each Tree to a different line, but does not differentiate the lines by color or anything else. Letâ€™s take a look:

##### Changing the `group` aesthetic mapping in `ggplot + geom_line`
```ggplot(Orange) +
geom_line(aes(x = age, y = circumference, group = Tree))
```

Youâ€™ll note that the 5 lines are separated as before, but the lines are all black and there is no legend differentiating them. Depending on the data youâ€™re working with, this may or may not be appropriate. Itâ€™s up to you as the person familiar with the data to determine how best to represent it in graph form!

In our Orange tree dataset, if youâ€™re interested in investigating how specific orange trees grew over time, youâ€™d want to use the `color` or `linetype` aesthetics to make sure you can track the progress for specific trees. If, instead, youâ€™re interested in only how orange trees in general grow, then using the `group` aesthetic is appropriate, simplifying your graph and discarding unnecessary detail.

`ggplot` is both flexible and powerful, but itâ€™s up to you to design a graph that communicates what you want to show. Just because you can do something doesnâ€™t mean you should. You should always think about what message youâ€™re trying to convey with a graph, then design from those principles.

Keep this in mind as we review the next two aesthetics. While these aesthetics absolutely have a place in data visualization, in the case of the particular dataset weâ€™re working with, they donâ€™t make very much sense. But this is a guide to using `geom_line` in `ggplot`, not graphing the growth of Orange trees, so Iâ€™m still going to cover them for the sake of completeness!

##### Changing transparency in `ggplot + geom_line` with the `alpha` aesthetic
```ggplot(Orange) +
geom_line(aes(x = age, y = circumference, alpha = Tree))
```

Here we map the `Tree` variable to the `alpha` aesthetic, which controls the transparency of the line. As you can see, certain lines are more transparent than others. In this case, transparency does not add to our understanding of the graph, so I would not use this to illustrate this dataset.

##### Changing the `size` aesthetic mapping in `ggplot + geom_line`
```ggplot(Orange) +
geom_line(aes(x = age, y = circumference, size = Tree))
```

Finally, we turn to the size aesthetic, which controls the size of lines. Again, I would say this is not does not add to our understanding of our data in this context. That said, it does slightly resemble Charles Joseph Minardâ€™s famous graph of the death tolls of Napoleonâ€™s disastrous 1812 Russia Campaign, so thatâ€™s kind of cool:

## Aesthetic mappings vs. parameters in ggplot

Before, we saw that we are able to use `color` in two different ways with geom_line. First, we were able to set the color of a line to red by specifying `color = 'red'` outside of our `aes()` mappings. Then, we were able to map the variable `Tree` to color by specifying `color = Tree` inside of our `aes()` mappings. How does this work with all of the other aesthetics you just learned about?

Essentially, they all work the same as color! Thatâ€™s the beautiful thing about graphing in `ggplot`â€“once you understand the syntax, itâ€™s very easy to expand your capabilities.

Each of the aesthetic mappings youâ€™ve seen can also be used as a parameter, that is, a fixed value defined outside of the `aes()` aesthetic mappings. You saw how to do this with color when we set the line to red with `color = 'red'` before. Now letâ€™s look at an example of how to do this with linetype in the same manner:

```ggplot(Orange) +
geom_line(aes(x = age, y = circumference, group = Tree), linetype = 'dotted')
```

To review what values `linetype`, `size`, and `alpha` accept, just run `?linetype`, `?size`, or `?alpha` from your console window!

## Common errors with aesthetic mappings and parameters in ggplot

When I was getting started with R and ggplot, the distinction between aesthetic mappings (the values included inside your `aes()`), and parameters (the ones outside your `aes()` was the concept that tripped me up the most. Youâ€™ll learn how to deal with these issues over time, but I can help you speed along this process with a few common errors that you can keep an eye out for.

##### Trying to include aesthetic mappings outside your `aes()` call

If youâ€™re trying to map the Tree variable to linetype, you should include `linetype == tree` within the `aes()` of your `geom_line` call. What happens if you accidentally include it outside, and instead run `ggplot(Orange) + geom_line(aes(x = age, y = circumference), linetype = Tree)`? Youâ€™ll get an error message that looks like this:

Whenever you see this error about object not found, make sure you check and make sure youâ€™re including your aesthetic mappings inside the `aes()` call!

##### Trying to specify parameters inside your `aes()` call

Alternatively, if we try to specify a specific parameter value (for example, `color = 'red'`) inside of the `aes()` mapping, we get a less intutive issue:

```ggplot(Orange) +
geom_line(aes(x = age, y = circumference, color = 'red'))
```

In this case, `ggplot` actually does produce a line graph (success!), but it doesnâ€™t have the result we intended. The graph it produces looks odd, because it is putting the values for all 5 trees on a single line, rather than on 5 separate lines like we had before. It did change the color to red, but it also included a legend that simply says â€˜redâ€™. When you run into issues like this, double check to make sure youâ€™re including the parameters of your graph outside your `aes()` call!

You should now have a solid understanding of how to use R to plot line graphs using `ggplot` and `geom_line`! Experiment with the things youâ€™ve learned to solidify your understanding. As an exercise, try producing a line graph of your own using a different dataset and at least one of the aesthetic mappings you learned about. Leave your graph in the comments or email it to me at [email protected] â€” Iâ€™d love to take a look at what you produce!

Did you find this post useful? I frequently write tutorials like this one to help you learn new skills and improve your data science. If you want to be notified of new tutorials, sign up here!

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.