Lists to Data.Frames with imap

[This article was first published on rstats-tips.net, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

When working with data which is a result of json-data converted to a list of lists of lists of lists … (you know what mean ;-)) I often want to convert it a data.frame.

Unfortunately there’s often a list in the source data which is unnamed. Or the list in one row is longer than the one in another row. So converting it straight forward into a data.frame or tibble fails with the error message Tibble columns must have compatible sizes.

So what to do? Just leave lists as values in the cells of the data.frame.

Let’s have a look at some sample data:

Sample data

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
options(tidyverse.quiet = TRUE)
library(tidyverse)

row_1 <- list(
  a = 42, 
  b = list("one", "two", "three", "four"),
  c = list("R", "python")
)

row_2 <- list(
  a = 3.14159, 
  b = list("A", "B"),
  c = list("Montana", "Ohio", "California")
)

source <- list(row_1, row_2)

So we have a list source which contains two entries. Both are lists on its own: row_1 and row_2.

Goal

As a result we want to get a data.frame (or tibble):

1
2
3
4
5
6
7
target <- tribble(
  ~a, ~b, ~c,
  42, list("one", "two", "three", "four"), list("R", "python"),
  3.14159, list("A", "B"), list("Montana", "Ohio", "California")
)

target
1
2
3
4
5
## # A tibble: 2 × 3
##       a b          c         
##   <dbl> <list>     <list>    
## 1 42    <list [4]> <list [2]>
## 2  3.14 <list [2]> <list [3]>

purrr::imap

Let’s start with a single row.

The idea is to iterate over each element of the the row_1. So purrr::map* seems to be the function-family of choice. But these functions iterate only over the values of the list. They don’t pass the name of each element.

So we need purrr::imap. This function takes two arguments, the value and the name, and puts them into the processing function:

1
2
row_1 %>% 
  purrr::imap_dfc(~ tibble({{.y}} := list(.x)))
1
2
3
4
## # A tibble: 1 × 3
##   a         b          c         
##   <list>    <list>     <list>    
## 1 <dbl [1]> <list [4]> <list [2]>

Okay, that seems pretty good. But the first column shouldn’t be a list. Here we want a normal column.

1
2
row_1 %>% 
  purrr::imap_dfc(~ tibble({{.y}} := ifelse(length(.x) > 1, list(.x), .x)))
1
2
3
4
## # A tibble: 1 × 3
##       a b          c         
##   <dbl> <list>     <list>    
## 1    42 <list [4]> <list [2]>

That’s really nice. So how do we process the whole list source? We use another instance of purrr::map*.

1
2
3
4
5
result <- source %>% 
  purrr::map_dfr(
    ~.x %>% purrr::imap_dfc(~ tibble({{.y}} := ifelse(length(.x) > 1, list(.x), .x)))
  )
result
1
2
3
4
5
## # A tibble: 2 × 3
##       a b          c         
##   <dbl> <list>     <list>    
## 1 42    <list [4]> <list [2]>
## 2  3.14 <list [2]> <list [3]>

To leave a comment for the author, please follow the link and comment on their blog: rstats-tips.net.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)