Take your data frames to the next level.

March 30, 2017
By

(This article was first published on R – Real Data, and kindly contributed to R-bloggers)

 

leo

While finishing up with R-rockstar Hadley Wickham’s book (Free Book – R for Data Science), the section on model building elaborates on something pretty cool that I had no idea about – list columns.

Most of us have probably seen the following data frame column format:

df <- data.frame("col_uno" = c(1,2,3),"col_dos" = c('a','b','c'), "col_tres" = factor(c("google", "apple", "amazon")))

And the output:

df
##   col_uno col_dos col_tres
## 1       1       a   google
## 2       2       b    apple
## 3       3       c   amazon

This is an awesome way to organize data and one of R’s strong points. However, we can use list functionality to go deeper. Check this out:

library(tidyverse)
library(datasets)
head(iris)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa
nested <- iris %>%
  group_by(Species) %>%
  nest()
## # A tibble: 3 × 2
##      Species              data
##                   
## 1     setosa 
## 2 versicolor 
## 3  virginica 

Using nest we can compartmentalize our data frame for readability and more efficient iteration. Here we can use map from the purrr package to compute the mean of each column in our nested data.

means <- map(nested$data, colMeans)
## [[1]]
## Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
##        5.006        3.428        1.462        0.246 
## 
## [[2]]
## Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
##        5.936        2.770        4.260        1.326 
## 
## [[3]]
## Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
##        6.588        2.974        5.552        2.026

Once you’re done messing around with data-ception, use unnest to revert your data back to its original state.

head(unnest(nested))
## # A tibble: 6 × 5
##   Species Sepal.Length Sepal.Width Petal.Length Petal.Width
##                                  
## 1  setosa          5.1         3.5          1.4         0.2
## 2  setosa          4.9         3.0          1.4         0.2
## 3  setosa          4.7         3.2          1.3         0.2
## 4  setosa          4.6         3.1          1.5         0.2
## 5  setosa          5.0         3.6          1.4         0.2
## 6  setosa          5.4         3.9          1.7         0.4

I was pretty excited to learn about this property of data.frames and will definitely make use of it in the future. If you have any neat examples of nested dataset usage, please feel free to share in the comments.  As always, I’m happy to answer questions or talk data!

Kiefer Smith

To leave a comment for the author, please follow the link and comment on their blog: R – Real Data.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)