3 tidyverse tricks for most commonly used Excel Features

August 16, 2019
By

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In this post, We’re simply going to see 5 tricks that could help improve your tooling using {`tidyverse`}.

Create a difference variable between the current value and the next value

This is also known as `lead` and `lag` – especially in a time series dataset this varaible becomes very important in feature engineering. In Excel, This is simply done by creating a new formula field and subtracting the next cell with the current cell or the current cell with the previous cell and dragging the cell formula to the last cell.

Let’s take our {`fakir`} `fake_visits` dataset and if we are trying to find a day when there was a huge peak/drop of `home` visits, we can identify only by first creating a column which must be the difference between the next value and the current value.

We can do with the {`dplyr`} function `lag()` and `lead()` for respective purposes.

``````library(tidyverse)

df <- fakir::fake_visits()

df %>% # filter(year %in% '2017') %>%
mutate(day_lag = lag(home, default = home[1]) - home) %>% head()``````
``````## # A tibble: 6 x 9
##   timestamp   year month   day  home about  blog contact day_lag
##
## 1 2017-01-01  2017     1     1   352   176   521      NA       0
## 2 2017-01-02  2017     1     2   203   115   492      89     149
## 3 2017-01-03  2017     1     3   103    59   549      NA     100
## 4 2017-01-04  2017     1     4   484   113   633     331    -381
## 5 2017-01-05  2017     1     5   438   138   423     227      46
## 6 2017-01-06  2017     1     6    NA    75   478     289      NA``````

Combining Multiple Columns into one Column

One of the things that we often do in Excel is combining multiple columns into one column by concatenating the cell values. Like in the above example, we can see three columns `year`, `month`, `date` but all of them combined together can give us a `date-format` date (assuming the `timestamp` varible isn’t present) and that’s where the function `unite()` comes handy.

``````df %>% select(-timestamp) %>% head() %>% unite("date-format", c("year", "month",
"day"), sep = "-")``````
``````## # A tibble: 6 x 5
##   `date-format`  home about  blog contact
##
## 1 2017-1-1        352   176   521      NA
## 2 2017-1-2        203   115   492      89
## 3 2017-1-3        103    59   549      NA
## 4 2017-1-4        484   113   633     331
## 5 2017-1-5        438   138   423     227
## 6 2017-1-6         NA    75   478     289``````

Splitting One Column into Multiple Columns

This is the inverse of what we did above and another very frequently used excel feature Text to Columns.

In the `fakir_visits()`, let’s assume we don’t have `year`,`month` and `day` separately and now we’ve got to create those three columns from `timestamp`. This is quite simple with `separate()` function.

``````df %>% select(-c("year", "month", "day")) %>% head() %>% separate(col = timestamp,
into = c("year", "month", "day"), sep = "-")``````
``````## # A tibble: 6 x 7
##   year  month day    home about  blog contact
##
## 1 2017  01    01      352   176   521      NA
## 2 2017  01    02      203   115   492      89
## 3 2017  01    03      103    59   549      NA
## 4 2017  01    04      484   113   633     331
## 5 2017  01    05      438   138   423     227
## 6 2017  01    06       NA    75   478     289``````

Summary

The idea of this post was to introduce those four functions:

• lag()
• unite()
• separate()

and show case how super-useful they are for many commonly used Excel features in Data Analysis.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.