Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I am tasked with explaining incredibly complex things to people who do not have a lot of time. Consequently, using visuals has been a life saver.

One day I was visiting a school explaining the Common Eurpoean Framework of Reference for Languages, which, in a nutshell, describes what language learners can do at different levels of proficiency AND the number of hours it takes for them to progress to each level.

During the presentation I used the following table in a slide: Image Courtesy of Keep Calm and Teach English

While that image is informative, it is, in my humble opinion, a little hard to comprehend in comparison to this one: So how do you make the plot above? Glad you asked 🙂

# Step 1: Create the data frame

As the table above shows, there are seven levels we want to represent (A0 to C2) and a range of hours from 0 – 1200.

library(tidyverse)
library(knitr) #To make the table look pretty on HTML

cefr_hours <- tibble(cefr = as_factor(c("A0", "A1", "A2", "B1", "B2", "C1", "C2")),
hours = c(0, 100, 200, 400, 600, 800, 1200))

kable(cefr_hours)

cefr hours
A0 0
A1 100
A2 200
B1 400
B2 600
C1 800
C2 1200

# Step 2: Expand the data frame

In order to color the sections between the levels, we need to create groups so that ggplot() divides the the plot based on the correct levels. To do that, we’ll simply double the data frame.

cefr_hours <- tibble(cefr = as_factor(c("A0", "A1", "A2", "B1", "B2", "C1", "C2")),
hours = c(0, 100, 200, 400, 600, 800, 1200))

cefr_hours <- cefr_hours %>%
bind_rows(cefr_hours)

kable(cefr_hours)

cefr hours
A0 0
A1 100
A2 200
B1 400
B2 600
C1 800
C2 1200
A0 0
A1 100
A2 200
B1 400
B2 600
C1 800
C2 1200

# Step 3: Create groups

Next, we rearrange the data frame by CEFR level (more on that later) and create a group for each level. To do so, we create a new column called group using dplyr::mutate.

cefr_hours <- tibble(cefr = as_factor(c("A0", "A1", "A2", "B1", "B2", "C1", "C2")),
hours = c(0, 100, 200, 400, 600, 800, 1200))

cefr_hours <- cefr_hours %>%
bind_rows(cefr_hours) %>%
arrange(cefr) %>%
mutate(group = ceiling((row_number() - 1) / 2))

kable(cefr_hours)

cefr hours group
A0 0 0
A0 0 1
A1 100 1
A1 100 2
A2 200 2
A2 200 3
B1 400 3
B1 400 4
B2 600 4
B2 600 5
C1 800 5
C1 800 6
C2 1200 6
C2 1200 7

If we don’t use arrange() we get the following mess.

cefr_hours <- tibble(cefr = as_factor(c("A0", "A1", "A2", "B1", "B2", "C1", "C2")),
hours = c(0, 100, 200, 400, 600, 800, 1200))

cefr_hours <- cefr_hours %>%
bind_rows(cefr_hours) %>%
mutate(group = ceiling((row_number() - 1) / 2))

kable(cefr_hours)

cefr hours group
A0 0 0
A1 100 1
A2 200 1
B1 400 2
B2 600 2
C1 800 3
C2 1200 3
A0 0 4
A1 100 4
A2 200 5
B1 400 5
B2 600 6
C1 800 6
C2 1200 7

## “What about ceiling()?”

Good question!

We use ceiling() in order to create the groups. Since we want “A1 to A2” to be one group, we need to return whole numbers. For more on how to use ceiling() please click here.

# Step 4: Remove Unecessary Groups

Since we don’t want the first or last level to be a group unto itself, we use dplyr::filter() to remove the first and the last group by saying group is equal to all rows except for the min() and max() (i.e., the first and the last).

cefr_hours <- tibble(cefr = as_factor(c("A0", "A1", "A2", "B1", "B2", "C1", "C2")),
hours = c(0, 100, 200, 400, 600, 800, 1200))

cefr_hours <- cefr_hours %>%
bind_rows(cefr_hours) %>%
arrange(cefr) %>%
mutate(group = ceiling((row_number() - 1) / 2)) %>%
filter(group != min(group), group != max(group))

kable(cefr_hours)

cefr hours group
A0 0 1
A1 100 1
A1 100 2
A2 200 2
A2 200 3
B1 400 3
B1 400 4
B2 600 4
B2 600 5
C1 800 5
C1 800 6
C2 1200 6

# Step 5: Make the plot

From here, it is simply a matter of plugging the data into ggplot().

ggplot(data = cefr_hours, mapping =aes(x= cefr, y=hours, group = group, fill = group)) +
geom_ribbon(aes(ymin = 0, ymax = hours)) But, of course, when we’re talking about ggplot(), that means we have no end of options at our disposal.

ggplot(data = cefr_hours, mapping =aes(x= cefr, y=hours, group = group, fill = group)) +
geom_ribbon(aes(ymin = 0, ymax = hours)) +
scale_color_brewer(palette = "Blues") +
theme_minimal() + # Set the theme
labs(title = "Hours of Guided Learning Per Level", # Give the plot a title
subtitle = "Source: Cambridge English Assessment", # Give it a subtitle
x = "", # Remove the title on the x axis
y = "") + # Remove the title on the y axis
theme(legend.position = "none", # Delete the legend
axis.text.x = element_text(size = 20), # Set the size to 20
axis.text.y = element_text(size = 20), # Set the size to 20
plot.title = element_text(size = 25)) # Set the size to 25 Finally, a special thanks to Jordo82 whose answer to my question enabled me to make this plot.

#Complete code
library(tidyverse)
library(knitr)

cefr_hours <- tibble(cefr = as_factor(c("A0", "A1", "A2", "B1", "B2", "C1", "C2")),
hours = c(0, 100, 200, 400, 600, 800, 1200))

cefr_hours <- cefr_hours %>%
bind_rows(cefr_hours) %>%
arrange(cefr) %>%
#create a group for A2 to B1, then B1 to B2, etc.
mutate(group = ceiling((row_number() - 1) / 2)) %>%
#exclude the first and last points
filter(group != min(group), group != max(group))

ggplot(data = cefr_hours, mapping =aes(x= cefr, y=hours, group = group, fill = group)) +
geom_ribbon(aes(ymin = 0, ymax = hours)) +
scale_color_brewer(palette = "Blues") +
theme_minimal() +
labs(title = "Hours of Guided Learning Per Level",
subtitle = "Source: Cambridge English Assessment",
x = "",
y = "") +
theme(legend.position = "none",
axis.text.x = element_text(size = 20),
axis.text.y = element_text(size = 20),
plot.title = element_text(size = 25))


Happy Coding!