Simplifying Data Transformation with pivot_longer() in R’s tidyr Library

[This article was first published on Steve's Data Tips and Tricks, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Introduction

In the world of data analysis and manipulation, tidying and reshaping data is often an essential step. R’s tidyr library provides powerful tools to efficiently transform and reshape data. One such function is pivot_longer(). In this blog post, we’ll explore how pivot_longer() works and demonstrate its usage through several examples. By the end, you’ll have a solid understanding of how to use this function to make your data more manageable and insightful.

The tidyr library holds the function, so we are going to have to load it first.

library(tidyr)

Understanding pivot_longer()

The pivot_longer() function is designed to reshape data from a wider format to a longer format. It takes columns that represent different variables and consolidates them into key-value pairs, making it easier to analyze and visualize the data.

Syntax: The basic syntax of pivot_longer() is as follows:

pivot_longer(data, cols, names_to, values_to)
  • data: The data frame or tibble to be reshaped.
  • cols: The columns to be transformed.
  • names_to: The name of the new column that will hold the variable names.
  • values_to: The name of the new column that will hold the corresponding values.

Example 1: Reshaping Wide Data to Long Data

Let’s start with a simple example to demonstrate the usage of pivot_longer(). Suppose we have a data frame called students with columns representing subjects and their respective scores:

students <- data.frame(
  name = c("Alice", "Bob", "Charlie"),
  math = c(90, 85, 92),
  science = c(95, 88, 91),
  history = c(87, 92, 78)
)

To reshape this data from a wider format to a longer format, we can use pivot_longer() as follows:

students_long <- pivot_longer(
  students, 
  cols = -name, 
  names_to = "subject", 
  values_to = "score"
  )

students_long
# A tibble: 9 × 3
  name    subject score
  <chr>   <chr>   <dbl>
1 Alice   math       90
2 Alice   science    95
3 Alice   history    87
4 Bob     math       85
5 Bob     science    88
6 Bob     history    92
7 Charlie math       92
8 Charlie science    91
9 Charlie history    78

The resulting students_long data frame will have three columns: name, subject, and score, where each row represents a student’s score in a specific subject.

Example 2: Handling Multiple Variables In many cases, data frames contain multiple variables that need to be pivoted simultaneously. Consider a data frame called sales with columns representing sales figures for different products in different regions:

sales <- data.frame(
  region = c("North", "South", "East"),
  product_A = c(100, 120, 150),
  product_B = c(80, 90, 110),
  product_C = c(60, 70, 80)
)

To reshape this data, we can specify multiple columns to pivot using pivot_longer():

sales_long <- pivot_longer(
  sales, 
  cols = starts_with("product"), 
  names_to = "product", 
  values_to = "sales"
  )

sales_long
# A tibble: 9 × 3
  region product   sales
  <chr>  <chr>     <dbl>
1 North  product_A   100
2 North  product_B    80
3 North  product_C    60
4 South  product_A   120
5 South  product_B    90
6 South  product_C    70
7 East   product_A   150
8 East   product_B   110
9 East   product_C    80

The resulting sales_long data frame will have three columns: region, product, and sales, where each row represents the sales figure of a specific product in a particular region.

Example 3: Handling Irregular Data

Sometimes, data frames contain irregular structures, such as missing values or uneven numbers of columns. pivot_longer() can handle such scenarios gracefully. Consider a data frame called measurements with columns representing different measurement types and their respective values:

measurements <- data.frame(
  timestamp = c("2022-01-01", "2022-01-02", "2022-01-03"),
  temperature = c(25.3, 27.1, 24.8),
  humidity = c(65.2, NA, 68.5),
  pressure = c(1013, 1012, NA)
)

To reshape this data, we can use pivot_longer() and handle the missing values:

measurements_long <- pivot_longer(
  measurements, 
  cols = -timestamp, 
  names_to = "measurement", 
  values_to = "value", 
  values_drop_na = TRUE
  )

measurements_long
# A tibble: 7 × 3
  timestamp  measurement  value
  <chr>      <chr>        <dbl>
1 2022-01-01 temperature   25.3
2 2022-01-01 humidity      65.2
3 2022-01-01 pressure    1013  
4 2022-01-02 temperature   27.1
5 2022-01-02 pressure    1012  
6 2022-01-03 temperature   24.8
7 2022-01-03 humidity      68.5

The resulting measurements_long data frame will have three columns: timestamp, measurement, and value, where each row represents a specific measurement at a particular timestamp. The values_drop_na argument ensures that rows with missing values are dropped.

Conclusion

In this blog post, we explored the pivot_longer() function from the tidyr library, which allows us to reshape data from a wider format to a longer format. We covered the syntax and provided several examples to illustrate its usage. By mastering pivot_longer(), you’ll be equipped to tidy your data and unleash its true potential for analysis and visualization.

To leave a comment for the author, please follow the link and comment on their blog: Steve's Data Tips and Tricks.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)