tidyr 0.2.0 (and reshape2 1.4.1)
[This article was first published on RStudio Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
tidyr 0.2.0 is now available on CRAN. tidyr makes it easy to “tidy” your data, storing it in a consistent form so that it’s easy to manipulate, visualise and model. Tidy data has variables in columns and observations in rows, and is described in more detail in the tidy data vignette. Install tidyr with:
install.packages("tidyr")
There are three important additions to tidyr 0.2.0:
expand()
is a wrapper aroundexpand.grid()
that allows you to generate all possible combinations of two or more variables. In conjunction withdplyr::left_join()
, this makes it easy to fill in missing rows of data.
sales <- dplyr::data_frame( year = rep(c(2012, 2013), c(4, 2)), quarter = c(1, 2, 3, 4, 2, 3), sales = sample(6) * 100 ) # Missing sales data for 2013 Q1 & Q4 sales #> Source: local data frame [6 x 3] #> #> year quarter sales #> 1 2012 1 400 #> 2 2012 2 200 #> 3 2012 3 500 #> 4 2012 4 600 #> 5 2013 2 300 #> 6 2013 3 100 # Missing values are now explicit sales %>% expand(year, quarter) %>% dplyr::left_join(sales) #> Joining by: c("year", "quarter") #> Source: local data frame [8 x 3] #> #> year quarter sales #> 1 2012 1 400 #> 2 2012 2 200 #> 3 2012 3 500 #> 4 2012 4 600 #> 5 2013 1 NA #> 6 2013 2 300 #> 7 2013 3 100 #> 8 2013 4 NA
- In the process of data tidying, it’s sometimes useful to have a column of a data frame that is a list of vectors.
unnest()
lets you simplify that column back down to an atomic vector, duplicating the original rows as needed. (NB: If you’re working with data frames containing lists, I highly recommend using dplyr’stbl_df
, which will display list-columns in a way that makes their structure more clear. Usedplyr::data_frame()
to create a data frame wrapped with thetbl_df
class.)
raw <- dplyr::data_frame( x = 1:3, y = c("a", "d,e,f", "g,h") ) # y is character vector containing comma separated strings raw #> Source: local data frame [3 x 2] #> #> x y #> 1 1 a #> 2 2 d,e,f #> 3 3 g,h # y is a list of character vectors as_list <- raw %>% mutate(y = strsplit(y, ",")) as_list #> Source: local data frame [3 x 2] #> #> x y #> 1 1 <chr[1]> #> 2 2 <chr[3]> #> 3 3 <chr[2]> # y is a character vector; rows are duplicated as needed as_list %>% unnest(y) #> Source: local data frame [6 x 2] #> #> x y #> 1 1 a #> 2 2 d #> 3 2 e #> 4 2 f #> 5 3 g #> 6 3 h
separate()
has a newextra
argument that allows you to control what happens if a column doesn’t always split into the same number of pieces.
raw %>% separate(y, c("trt", "B"), ",") #> Error: Values not split into 2 pieces at 1, 2 raw %>% separate(y, c("trt", "B"), ",", extra = "drop") #> Source: local data frame [3 x 3] #> #> x trt B #> 1 1 a NA #> 2 2 d e #> 3 3 g h raw %>% separate(y, c("trt", "B"), ",", extra = "merge") #> Source: local data frame [3 x 3] #> #> x trt B #> 1 1 a NA #> 2 2 d e,f #> 3 3 g h
To read about the other minor changes and bug fixes, please consult the release notes.
reshape2 1.4.1
There’s also a new version of reshape2, 1.4.1. It includes three bug fixes for melt.data.frame()
contributed by Kevin Ushey. Read all about them on the release notes and install it with:
install.packages("reshape2")
To leave a comment for the author, please follow the link and comment on their blog: RStudio Blog.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.