# Coloring Under the Lines in ggplot

**Home on educatorsRlearners: A blog about education and R**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I am tasked with explaining incredibly complex things to people who do not have a lot of time. Consequently, using visuals has been a life saver.

One day I was visiting a school explaining the Common Eurpoean Framework of Reference for Languages, which, in a nutshell, describes what language learners can do at different levels of proficiency **AND** the number of hours it takes for them to progress to each level.

During the presentation I used the following table in a slide:

Image Courtesy of *Keep Calm and Teach English*

While that image is informative, it is, in my humble opinion, a little hard to comprehend in comparison to this one:

So how do you make the plot above? Glad you asked 😄

# Step 1: Create the data frame

As the table above shows, there are seven levels what we want to represent (A0 to C2) and a range of hours from 0 – 1200.

library(tidyverse) library(knitr) #To make the table look pretty on HTML cefr_hours <- tibble(cefr = as_factor(c("A0", "A1", "A2", "B1", "B2", "C1", "C2")), hours = c(0, 100, 200, 400, 600, 800, 1200)) kable(cefr_hours)

cefr | hours |
---|---|

A0 | 0 |

A1 | 100 |

A2 | 200 |

B1 | 400 |

B2 | 600 |

C1 | 800 |

C2 | 1200 |

# Step 2: Expand the data frame

In order to color the sections between the levels, we need to create groups so that `ggplot()`

divides the the plot based on the correct levels. To do that, we’ll simply double the data frame.

cefr_hours <- tibble(cefr = as_factor(c("A0", "A1", "A2", "B1", "B2", "C1", "C2")), hours = c(0, 100, 200, 400, 600, 800, 1200)) cefr_hours <- cefr_hours %>% bind_rows(cefr_hours) kable(cefr_hours)

cefr | hours |
---|---|

A0 | 0 |

A1 | 100 |

A2 | 200 |

B1 | 400 |

B2 | 600 |

C1 | 800 |

C2 | 1200 |

A0 | 0 |

A1 | 100 |

A2 | 200 |

B1 | 400 |

B2 | 600 |

C1 | 800 |

C2 | 1200 |

# Step 3: Create groups

Next, we rearrange the data frame by CEFR level (more on that later) and create a group for each level. To do so, we create a new column called `group`

using `dplyr::mutate`

.

cefr_hours <- tibble(cefr = as_factor(c("A0", "A1", "A2", "B1", "B2", "C1", "C2")), hours = c(0, 100, 200, 400, 600, 800, 1200)) cefr_hours <- cefr_hours %>% bind_rows(cefr_hours) %>% arrange(cefr) %>% mutate(group = ceiling((row_number() - 1) / 2)) kable(cefr_hours)

cefr | hours | group |
---|---|---|

A0 | 0 | 0 |

A0 | 0 | 1 |

A1 | 100 | 1 |

A1 | 100 | 2 |

A2 | 200 | 2 |

A2 | 200 | 3 |

B1 | 400 | 3 |

B1 | 400 | 4 |

B2 | 600 | 4 |

B2 | 600 | 5 |

C1 | 800 | 5 |

C1 | 800 | 6 |

C2 | 1200 | 6 |

C2 | 1200 | 7 |

If we don’t use `arrange()`

we get the following mess.

cefr_hours <- tibble(cefr = as_factor(c("A0", "A1", "A2", "B1", "B2", "C1", "C2")), hours = c(0, 100, 200, 400, 600, 800, 1200)) cefr_hours <- cefr_hours %>% bind_rows(cefr_hours) %>% mutate(group = ceiling((row_number() - 1) / 2)) kable(cefr_hours)

cefr | hours | group |
---|---|---|

A0 | 0 | 0 |

A1 | 100 | 1 |

A2 | 200 | 1 |

B1 | 400 | 2 |

B2 | 600 | 2 |

C1 | 800 | 3 |

C2 | 1200 | 3 |

A0 | 0 | 4 |

A1 | 100 | 4 |

A2 | 200 | 5 |

B1 | 400 | 5 |

B2 | 600 | 6 |

C1 | 800 | 6 |

C2 | 1200 | 7 |

*“What about *`ceiling()`

?”

`ceiling()`

?”Good question!

We use `ceiling()`

in order to create the groups. If since we want “A1 to A2” to be one group, we need to return whole numbers. For more on how to use `ceiling()`

please click here.

# Step 4: Remove Unecessary Groups

Since we don’t want the first or last level to be a group unto itself, we use `dplyr::filter()`

to remove the first and the last group by saying `group`

is equal to all rows except for the `min()`

and `max()`

(i.e., the first and the last).

cefr_hours <- tibble(cefr = as_factor(c("A0", "A1", "A2", "B1", "B2", "C1", "C2")), hours = c(0, 100, 200, 400, 600, 800, 1200)) cefr_hours <- cefr_hours %>% bind_rows(cefr_hours) %>% arrange(cefr) %>% mutate(group = ceiling((row_number() - 1) / 2)) %>% filter(group != min(group), group != max(group)) kable(cefr_hours)

cefr | hours | group |
---|---|---|

A0 | 0 | 1 |

A1 | 100 | 1 |

A1 | 100 | 2 |

A2 | 200 | 2 |

A2 | 200 | 3 |

B1 | 400 | 3 |

B1 | 400 | 4 |

B2 | 600 | 4 |

B2 | 600 | 5 |

C1 | 800 | 5 |

C1 | 800 | 6 |

C2 | 1200 | 6 |

# Step 5: Make the plot

From here, it is simply a matter of plugging the data into `ggplot()`

.

ggplot(data = cefr_hours, mapping =aes(x= cefr, y=hours, group = group, fill = group)) + geom_ribbon(aes(ymin = 0, ymax = hours))

But, of course, when we’re talking about `ggplot()`

, that means we have no end of options at our disposal.

ggplot(data = cefr_hours, mapping =aes(x= cefr, y=hours, group = group, fill = group)) + geom_ribbon(aes(ymin = 0, ymax = hours)) + scale_color_brewer(palette = "Blues") + theme_minimal() + # Set the theme labs(title = "Hours of Guided Learning Per Level", # Give the plot a title subtitle = "Source: Cambridge English Assessment", # Give it a subtitle x = "", # Remove the title on the x axis y = "") + # Remove the title on the y axis theme(legend.position = "none", # Delete the legend axis.text.x = element_text(size = 20), # Set the size to 20 axis.text.y = element_text(size = 20), # Set the size to 20 plot.title = element_text(size = 25)) # Set the size to 25

Finally, a special thanks to Jordo82 whose answer to my question enabled me to make this plot.

Happy Coding!

**leave a comment**for the author, please follow the link and comment on their blog:

**Home on educatorsRlearners: A blog about education and R**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.