How to create multiple variables with a single line of code in R

Posted on August 28, 2019 by Anisa Dhana in R bloggers | 0 Comments

[This article was first published on R Programming – DataScience+, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Are you interested in guest posting? Publish at DataScience+ via your editor (i.e., RStudio).

Library and data

library(tidyverse)
dat = as.data.frame(esoph)
dat
##    agegp     alcgp    tobgp ncases ncontrols
## 1  25-34 0-39g/day 0-9g/day      0        40
## 2  25-34 0-39g/day    10-19      0        10
## 3  25-34 0-39g/day    20-29      0         6
## 4  25-34 0-39g/day      30+      0         5
## 5  25-34     40-79 0-9g/day      0        27
## 6  25-34     40-79    10-19      0         7
## 7  25-34     40-79    20-29      0         4
## 8  25-34     40-79      30+      0         7
## 9  25-34    80-119 0-9g/day      0         2
## 10 25-34    80-119    10-19      0         1
## 11 25-34    80-119      30+      0         2
## 12 25-34      120+ 0-9g/day      0         1
## 13 25-34      120+    10-19      1         1
## 14 25-34      120+    20-29      0         1
## 15 25-34      120+      30+      0         2
...
...

The problem

I want to create tertiles for each variable in the dataset and want to exclude the ncases and ncontrols from the computation.

The solution

Here is the code (see below the explanation):

dat %>% 
  mutate_at(list(tertile = ~ntile(., 3)), .vars = vars(ends_with("gp"), -starts_with("nc")))
##    agegp     alcgp    tobgp ncases ncontrols agegp_tertile alcgp_tertile
## 1  25-34 0-39g/day 0-9g/day      0        40             1             1
## 2  25-34 0-39g/day    10-19      0        10             1             1
## 3  25-34 0-39g/day    20-29      0         6             1             1
## 4  25-34 0-39g/day      30+      0         5             1             1
## 5  25-34     40-79 0-9g/day      0        27             1             1
## 6  25-34     40-79    10-19      0         7             1             1
## 7  25-34     40-79    20-29      0         4             1             1
## 8  25-34     40-79      30+      0         7             1             1
## 9  25-34    80-119 0-9g/day      0         2             1             2
## 10 25-34    80-119    10-19      0         1             1             2
## 11 25-34    80-119      30+      0         2             1             2
## 12 25-34      120+ 0-9g/day      0         1             1             3
...
...

The ntile function is used to create tertiles of the variables. The ends_with will select variables of interests, and given that all ends with “gp” I used that function. The function -starts_with will exlclude the variables ncases and ncontrols.

Another example

If you want to do other computations such as standardize the variables youu can use the code below. In this example I am using the ncases and ncontrols because are continuous.

dat %>% 
  mutate_at(list(sd = ~./sd(.)), .vars = vars(-ends_with("gp"), starts_with("nc")))
##    agegp     alcgp    tobgp ncases ncontrols ncases_sd ncontrols_sd
## 1  25-34 0-39g/day 0-9g/day      0        40 0.0000000   3.14398607
## 2  25-34 0-39g/day    10-19      0        10 0.0000000   0.78599652
## 3  25-34 0-39g/day    20-29      0         6 0.0000000   0.47159791
## 4  25-34 0-39g/day      30+      0         5 0.0000000   0.39299826
## 5  25-34     40-79 0-9g/day      0        27 0.0000000   2.12219060
## 6  25-34     40-79    10-19      0         7 0.0000000   0.55019756
## 7  25-34     40-79    20-29      0         4 0.0000000   0.31439861
## 8  25-34     40-79      30+      0         7 0.0000000   0.55019756
## 9  25-34    80-119 0-9g/day      0         2 0.0000000   0.15719930
## 10 25-34    80-119    10-19      0         1 0.0000000   0.07859965
## 11 25-34    80-119      30+      0         2 0.0000000   0.15719930
## 12 25-34      120+ 0-9g/day      0         1 0.0000000   0.07859965
## 13 25-34      120+    10-19      1         1 0.3632179   0.07859965
## 14 25-34      120+    20-29      0         1 0.0000000   0.07859965
## 15 25-34      120+      30+      0         2 0.0000000   0.15719930
...
...

Thats all.

I hope you find these tips and trics useful for your data analysis.

Related Post

To leave a comment for the author, please follow the link and comment on their blog: R Programming – DataScience+.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

How to create multiple variables with a single line of code in R

Category

Tags

Library and data

The problem

The solution

Another example

Related

Category

Tags

Library and data

The problem

The solution

Another example

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)