# Creating Population Pyramid Plots in R with ggplot2

**Steve's Data Tips and Tricks**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

# Introduction

Are you interested in visualizing demographic data in a unique and insightful way? Population pyramids are a fantastic tool for this purpose! They allow you to compare the distribution of populations across age groups for different genders or time periods. In this blog post, we’ll explore how to create population pyramid plots in R using the powerful ggplot2 library. Don’t worry if you’re new to R or ggplot2; we’ll walk you through the process step by step.

# Getting Started

Before we dive into creating population pyramids, make sure you have R and ggplot2 installed. You can install ggplot2 by running the following command if you haven’t already:

install.packages("ggplot2")

Now that you’re set up let’s start by loading the necessary libraries and preparing our data.

# Loading Libraries

# Load the required libraries library(ggplot2)

# Preparing Data

For this example, we’ll use a hypothetical dataset that represents the population distribution by age and gender. You can replace this dataset with your own data, but for now, let’s create a sample dataset:

# Creating a sample dataset data <- data.frame( Age = c(0:9, 0:9), Gender = c(rep("Male", 10), rep("Female", 10)), Population = c(200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 190, 240, 290, 330, 380, 430, 480, 530, 580, 630) )

# Constructing the Population Pyramid

Now that we have our data, let’s create the population pyramid plot step by step using ggplot2.

## Step 1: Create a Basic Bar Chart

Start by creating a basic bar chart representing the population distribution for one gender. We’ll use the `geom_bar`

function to do this.

# Create a basic bar chart for one gender basic_plot <- ggplot( data, aes( x = Age, fill = Gender, y = ifelse( test = Gender == "Male", yes = -Population, no = Population ) ) ) + geom_bar(stat = "identity")

In this code:

- We filter the data to include only one gender (Male) using
`subset`

. - We use
`aes`

to specify the aesthetic mappings. We map`Age`

to the x-axis,`-Population`

to the y-axis (note the negative sign to flip the bars), and`Age`

to the fill color. `geom_bar`

is used to create the bar chart, and`stat = "identity"`

ensures that the heights of the bars are determined by the`Population`

variable.- Finally,
`coord_flip()`

is applied to flip the chart horizontally, making it look like a pyramid.

## Step 2: Combine Both Genders

To create a population pyramid, we need to combine both male and female data. We’ll create two separate plots for each gender and then combine them using the `+`

operator.

# Create population pyramids for both genders and combine them population_pyramid <- basic_plot + scale_y_continuous( labels = abs, limits = max(data$Population) * c(-1,1) ) + coord_flip() + theme_minimal() + labs( x = "Age", y = "Population", fill = "Age", title = "Population Pyramid" )

In this step:

- scale_y_continuous(labels = abs, limits = max(data$Population) * c(-1,1)):
- This part adjusts the y-axis (vertical axis) of the plot.
- labels = abs means that the labels on the y-axis will show the absolute values (positive numbers) rather than negative values.
- limits = max(data$Population) * c(-1,1) sets the limits of the y-axis. It ensures that the y-axis extends from the maximum population value (positive) to the minimum (negative) value, creating a symmetrical pyramid shape.
- coord_flip(): This function flips the coordinate system of the plot. By default, the x-axis (horizontal) represents age, and the y-axis (vertical) represents population. coord_flip() swaps them so that the x-axis represents population and the y-axis represents age, creating the pyramid effect.
- theme_minimal(): This sets the overall visual theme of the plot to a minimalistic style. It adjusts the background, gridlines, and other visual elements to a simple and clean appearance.
- labs(x = “Age”, y = “Population”, fill = “Age”, title = “Population Pyramid”): This part labels various elements of the plot:
- x = “Age” labels the x-axis as “Age.”
- y = “Population” labels the y-axis as “Population.”
- fill = “Age” specifies that the “Age” variable will be used to fill the bars in the plot.
- title = “Population Pyramid” sets the title of the plot as “Population Pyramid.”

## Step 3: Customize Your Plot

Feel free to customize your plot further by adding labels, adjusting colors, or modifying other aesthetics to match your preferences. The `ggplot2`

library provides extensive customization options.

## Step 4: Visualize Your Population Pyramid

To visualize your population pyramid, simply print the `population_pyramid`

object:

population_pyramid

This will display the population pyramid plot in your R graphics window.

# Conclusion

Creating population pyramid plots in R using ggplot2 can be a powerful way to visualize demographic data. In this blog post, we walked through the process step by step, from loading libraries and preparing data to constructing and customizing the pyramid plot. Now it’s your turn to give it a try with your own data or explore additional features and customization options in ggplot2. Happy plotting!

**leave a comment**for the author, please follow the link and comment on their blog:

**Steve's Data Tips and Tricks**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.