tidyr::spread() and dplyr::rename_at() in action

(This article was first published on Econometrics and Free Software, and kindly contributed to R-bloggers)



I was recently confronted to a situation that required going from a long dataset to a wide dataset, but with a small twist: there were two datasets, which I had to merge into one. You might wonder what kinda crappy twist that is, right? Well, let’s take a look at the data:

data1; data2
## # A tibble: 20 x 4
##    country       date       variable_1 value
##                         
##  1      lu 01/01/2005            maybe    22
##  2      lu 01/07/2005            maybe    13
##  3      lu 01/01/2006            maybe    40
##  4      lu 01/07/2006            maybe    25
##  5      lu 01/01/2005    totally_agree    42
##  6      lu 01/07/2005    totally_agree    17
##  7      lu 01/01/2006    totally_agree    25
##  8      lu 01/07/2006    totally_agree    16
##  9      lu 01/01/2005 totally_disagree    39
## 10      lu 01/07/2005 totally_disagree    17
## 11      lu 01/01/2006 totally_disagree    23
## 12      lu 01/07/2006 totally_disagree    21
## 13      lu 01/01/2005   kinda_disagree    69
## 14      lu 01/07/2005   kinda_disagree    12
## 15      lu 01/01/2006   kinda_disagree    10
## 16      lu 01/07/2006   kinda_disagree     9
## 17      lu 01/01/2005      kinda_agree    38
## 18      lu 01/07/2005      kinda_agree    31
## 19      lu 01/01/2006      kinda_agree    19
## 20      lu 01/07/2006      kinda_agree    12
## # A tibble: 20 x 4
##    country       date       variable_2 value
##                         
##  1      lu 01/01/2005      kinda_agree    22
##  2      lu 01/07/2005      kinda_agree    13
##  3      lu 01/01/2006      kinda_agree    40
##  4      lu 01/07/2006      kinda_agree    25
##  5      lu 01/01/2005    totally_agree    42
##  6      lu 01/07/2005    totally_agree    17
##  7      lu 01/01/2006    totally_agree    25
##  8      lu 01/07/2006    totally_agree    16
##  9      lu 01/01/2005 totally_disagree    39
## 10      lu 01/07/2005 totally_disagree    17
## 11      lu 01/01/2006 totally_disagree    23
## 12      lu 01/07/2006 totally_disagree    21
## 13      lu 01/01/2005            maybe    69
## 14      lu 01/07/2005            maybe    12
## 15      lu 01/01/2006            maybe    10
## 16      lu 01/07/2006            maybe     9
## 17      lu 01/01/2005   kinda_disagree    38
## 18      lu 01/07/2005   kinda_disagree    31
## 19      lu 01/01/2006   kinda_disagree    19
## 20      lu 01/07/2006   kinda_disagree    12

As explained in Hadley (2014), this is how you should keep your data… But for a particular purpose, I had to transform these datasets. What I was asked to do was to merge these into a single wide data frame. Doing this for one dataset is easy:

data1 %>%
  spread(variable_1, value)
## # A tibble: 4 x 7
##   country       date kinda_agree kinda_disagree maybe totally_agree
## *                                    
## 1      lu 01/01/2005          38             69    22            42
## 2      lu 01/01/2006          19             10    40            25
## 3      lu 01/07/2005          31             12    13            17
## 4      lu 01/07/2006          12              9    25            16
## # ... with 1 more variables: totally_disagree 

But because data1 and data2 have the same levels for variable_1 and variable_2, this would not work. So the solution I found online, in this SO thread was to use tidyr::spread() with dplyr::rename_at() like this:

data1 <- data1 %>%
  spread(variable_1, value) %>%
  rename_at(vars(-country, -date), funs(paste0("variable1:", .)))

glimpse(data1)
## Observations: 4
## Variables: 7
## $ country                     "lu", "lu", "lu", "lu"
## $ date                        "01/01/2005", "01/01/2006", "01/07/...
## $ variable1:kinda_agree       38, 19, 31, 12
## $ variable1:kinda_disagree    69, 10, 12, 9
## $ variable1:maybe             22, 40, 13, 25
## $ variable1:totally_agree     42, 25, 17, 16
## $ variable1:totally_disagree  39, 23, 17, 21
data2 <- data2 %>%
  spread(variable_2, value) %>%
  rename_at(vars(-country, -date), funs(paste0("variable2:", .)))

glimpse(data2)
## Observations: 4
## Variables: 7
## $ country                     "lu", "lu", "lu", "lu"
## $ date                        "01/01/2005", "01/01/2006", "01/07/...
## $ variable2:kinda_agree       22, 40, 13, 25
## $ variable2:kinda_disagree    38, 19, 31, 12
## $ variable2:maybe             69, 10, 12, 9
## $ variable2:totally_agree     42, 25, 17, 16
## $ variable2:totally_disagree  39, 23, 17, 21

rename_at() needs variables which you pass to vars(), a helper function to select variables, and a function that will do the renaming, passed to funs(). The function I use is simply paste0(), which pastes a string, for example “variable1:” with the name of the columns, given by the single ‘.’, a dummy argument. Now these datasets can be merged:

data1 %>%
  full_join(data2) %>%
  glimpse()
## Joining, by = c("country", "date")
## Observations: 4
## Variables: 12
## $ country                     "lu", "lu", "lu", "lu"
## $ date                        "01/01/2005", "01/01/2006", "01/07/...
## $ variable1:kinda_agree       38, 19, 31, 12
## $ variable1:kinda_disagree    69, 10, 12, 9
## $ variable1:maybe             22, 40, 13, 25
## $ variable1:totally_agree     42, 25, 17, 16
## $ variable1:totally_disagree  39, 23, 17, 21
## $ variable2:kinda_agree       22, 40, 13, 25
## $ variable2:kinda_disagree    38, 19, 31, 12
## $ variable2:maybe             69, 10, 12, 9
## $ variable2:totally_agree     42, 25, 17, 16
## $ variable2:totally_disagree  39, 23, 17, 21

Hope this post helps you understand the difference between long and wide datasets better, as well as dplyr::rename_at()!

To leave a comment for the author, please follow the link and comment on their blog: Econometrics and Free Software.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)