Making bar plots using ggplot in R

[This article was first published on JourneyR Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Though ggplot() makes beautiful graphics, I often find myself going back to old projects to find a template for how to set up the code to make ggplot() graphs. Here I will show you how to make a barplot in ggplot. Then we will look at some variants of the barplot that are useful when visualizing different types of data.

Data Preparation

As with any data project, the data isn’t usually ready for plotting right out of the box. Here I use the climate data from the Berkely Earth climate change data which records the temperature on different days in various cities across the globe from 1743 through 2013. If you have data that is already formatted for plotting and you’re not interested in these steps, skip down to the “Plotting” section to dive right into the plots.

First, read the data which we store as df and filter out rows with missing data (NAs).

df <- read.csv("Data/GlobalLandTemperaturesByCity.csv")
df <- df %>% filter(!is.na(AverageTemperature))

Then format the dates. Take the dt column and format it as a date. Then we will make a new column, Year, that contains only the year so that we can filter by year.

df$dt <- as.Date(df$dt)
df$Year <- format(df$dt, format="%Y")
df$Year <- as.numeric(df$Year)

Next we will summarize the data to get the average temperature in each city by year. To accomplish this we use the dplyr package to group the data by Year and City. Then we use the summarise() function (also from dplyr) to take the average temperature within each group. So that we don’t lose the data on country, latitude, and longitude we take the first instance of each.

df_yearly <- df %>% 
  group_by(Year, City) %>% 
  dplyr::summarise(AverageTemp = mean(AverageTemperature, na.rm=T),
         Country = first(Country),
         Latitude = first(Latitude),
         Longitude = first(Longitude))

To summarize the data by continent we need to find out which continent each country is on. The package countrycode can accomplish this task with ease. Convert the df_yearly tibble that we created in the step above into a dataframe. The differences between a tibble and dataframe are nuanced, but for now, the countrycode package doesn’t handle tibbles well. The next step feeds the Country column from df_yearly into countrycode() and extracts the continent for each country.

require(countrycode)
df_yearly <- data.frame(df_yearly) #countrycode package doesn't handle tibbles well
df_yearly$continent <- countrycode(sourcevar = df_yearly[, "Country"],
                            origin = "country.name",
                            destination = "continent")
df_yearly$continent <- as.factor(df_yearly$continent)

The last step in data processing is to summarize by continent. Here we use the dplyr functions filter, group_by, and summarise to take our yearly average data, filter out two years: 1913 and 2013, then group for each continent and each year (1913 and 2013) summarize the average and standard deviation of temperature as well as the number of cities on that continent that went into the average.

df_continent <- df_yearly %>% 
  filter(Year == 1913 | Year == 2013) %>% 
  group_by(continent, Year) %>% 
  dplyr::summarise(mean_temp = mean(AverageTemp),
                   std_temp = sd(AverageTemp),
                   n_cities = n())

Plotting

It’s finally time to start plotting. We will start with a basic barplot in ggplot and then move on to some useful variants.

The structure for any ggplot graph is similar: ggplot(data, aes(x, y, fill)) + geometry. Here we fill in the dataframe, x variable (continent), y variable (average temperature), and fill (year). The critical part of code to make a barplot as opposed to other kinds of plots is the geom_bar() function. Here we add the argument position = position_dodge() which puts the grouped bars side by side (instead of stacked). If we had data that did not have groups then we would simply use geom_bar(stat = "identity").

temp_ct_vert <- ggplot(data = df_continent, aes(x = continent, y = mean_temp, fill = as.factor(Year))) +
  geom_bar(stat = "identity", color = "black", position = position_dodge()) +
  labs(x = "Continent", y = "Mean Yearly Temperature", fill = "Year") +
  theme_minimal() +
  theme(text = element_text(size = 20))
temp_ct_vert
Basic grouped barchart in ggplot showing average temperature on each continent for 1913 compared to 2013.

Another option is to arrange the groups on the x-axis by something other than alphabetical order. Here we sort by the average temperature from highest to lowest using reorder(continent, -mean_temp).

temp_ct_vert_sort <- ggplot(data = df_continent, aes(x = reorder(continent, -mean_temp), y = mean_temp, fill = as.factor(Year))) +
  geom_bar(stat = "identity", color = "black", position = position_dodge()) +
  labs(x = "Continent", y = "Mean Yearly Temperature", fill = "Year") +
  theme_minimal() +
  theme(text = element_text(size = 20))
temp_ct_vert_sort
Basic grouped barchart ammended so that bars are in descending order by average temperature.

We can add numbers to the plots. Here we show the number of cities on each continent in each year. We do this using geom_text() with an argument position = position_dodge() so that the numbers show up on side by side grouped bars.

temp_ct_vert_num <- ggplot(data = df_continent, 
                           aes(x = continent, 
                               y = mean_temp, 
                               fill = as.factor(Year))) +
  geom_bar(stat = "identity", 
           color = "black", 
           position = position_dodge()) +
  geom_text(aes(label = n_cities), 
            vjust = 1.6, 
            color = "black",
            position = position_dodge(0.9), 
            size = 3.5) +
  labs(x = "Continent", 
       y = "Mean Yearly Temperature", 
       fill = "Year") +
  theme_minimal() +
  theme(text = element_text(size = 20))
temp_ct_vert_num
Plot with the number of cities in each group at the top of the bar.

Another useful variant is to add error bars. Here the error bars show the standard deviation of the average temperature for each country in that year. We add the geom_errorbar() function and with the aes() command put the lower and upper extent of the error bars (mean_temp - std_temp and mean_temp + std_temp).

temp_ct_vert_eb <- ggplot(data = df_continent, aes(x = continent, y = mean_temp, fill = as.factor(Year))) +
  geom_bar(stat = "identity", color = "black", position = position_dodge()) +
  geom_errorbar(aes(ymin = mean_temp - std_temp, ymax = mean_temp + std_temp), width = 0.2,
                position = position_dodge(0.9)) +
  labs(x = "Continent", y = "Mean Yearly Temperature", fill = "Year") +
  theme_minimal() +
  theme(text = element_text(size = 20))
temp_ct_vert_eb
Ggplot barchart with error bars that represent the standard deviation of average temperature.

Flipping the bars to a horizontal position can make it easier to read the labels or may look clearer for some data. We do this by adding coord_flip(). In this example we don’t re-type all of the code from temp_ct_vert but instead just use the saved object and add coord_flip() to it. This is a shortcut that can make it faster to make variations of plots and also make it clearer what features changed between different versions in the code.

temp_ct_horiz <- temp_ct_vert + coord_flip()
temp_ct_horiz
Horizontal barchart for average temperature by continent in 1913 vs 2013.

In the final example we make a stacked barchart. The key difference is that geom_bar() does not have position_dodge() which leaves the bars stacked.

temp_ct_vert_stack <- ggplot(data = df_continent, aes(x = continent, y = mean_temp, fill = as.factor(Year))) +
  geom_bar(stat = "identity", color = "black") +
  labs(x = "Continent", y = "Mean Yearly Temperature", fill = "Year") +
  theme_minimal() +
  theme(text = element_text(size = 20))
temp_ct_vert_stack
Stacked barchart in ggplot.

Other Resources

There are many great resources for working with ggplot() and the geom_bar() function.

  • The blog post from sthda.com walks through variants
  • These two posts from the graph gallery on basic barplots and grouped barplots (one of my favorite places to get inspiration for R visualizations with beautiful graphics and easy to follow instructions)
  • Examples for customizations from the Cookbook for R
  • Hadley Wickham’s ggplot book

There you have it! Barplots with ggplot. I hope that you found this post helpful or at least interesting. Please let me know if you have an R question that you would like explained on here. And thanks for following along with my R journey.

To leave a comment for the author, please follow the link and comment on their blog: JourneyR Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)