Create new variables from existing variables in R

[This article was first published on Data Science Tutorials, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post Create new variables from existing variables in R appeared first on Data Science Tutorials

Create new variables from existing variables in R?. To create new variables from existing variables, use the case when() function from the dplyr package in R.

What Is the Best Way to Filter by Date in R? – Data Science Tutorials

The following is the fundamental syntax for this function.

library(dplyr)
df %>%
  mutate(new_var = case_when(var1 < 25 ~ 'low',
                             var2 < 35 ~ 'med',
                             TRUE ~ 'high'))

It’s worth noting that TRUE is the same as an “else” expression.

With the given data frame, the following examples demonstrate how to utilize this function in practice.

Calculate the P-Value from Chi-Square Statistic in R.Data Science Tutorials

Let’s create a data frame

df <- data.frame(player = c('A', 'B', 'C', 'D', 'E', 'F'),
                 position = c('R1', 'R2', 'R3', 'R4', 'R5', NA),
                 points = c(102, 105, 219, 322, 232, NA),
                 assists = c(405, 407, 527, 412, 211, NA))

Now we can view the data frame

df
  player position points assists
1      A       R1    102     405
2      B       R2    105     407
3      C       R3    219     527
4      D       R4    322     412
5      E       R5    232     211
6      F     <NA>     NA      NA

Example 1: Create New Variable from One Existing Variable

The following code demonstrates how to make a new variable named quality with values generated from the points column.

Test for Normal Distribution in R-Quick Guide – Data Science Tutorials

df %>%
  mutate(quality = case_when(points > 120 ~ 'high',
                             points > 215 ~ 'med',
                             TRUE ~ 'low' ))
    player position points assists quality
1      A       R1    102     405     low
2      B       R2    105     407     low
3      C       R3    219     527    high
4      D       R4    322     412    high
5      E       R5    232     211    high
6      F     <NA>     NA      NA     low

The case when() function created the values for the new column in the following way.

The value in the quality column is “high” if the value in the points column is greater than 120.

If the score in the points column is greater than 215, the quality column value will be “med.”

Count Observations by Group in R – Data Science Tutorials

Otherwise, if the points column value is less than or equal to 215 (or a missing value like NA), the quality column value is “poor.”

Example 2: Create New Variable from Multiple Variables

The following code demonstrates how to make a new variable named quality with values drawn from both the points and assists columns.

df %>%
  mutate(quality = case_when(points > 215 & assists > 10 ~ 'great',
                             points > 215 & assists > 5 ~ 'good',
                             TRUE ~ 'average' ))
  player position points assists quality
1      A       R1    102     405 average
2      B       R2    105     407 average
3      C       R3    219     527   great
4      D       R4    322     412   great
5      E       R5    232     211   great
6      F     <NA>     NA      NA average

It’s worth noting that the is.na() function can also be used to explicitly assign strings to NA values.

Best GGPlot Themes You Should Know – Data Science Tutorials

df %>%
  mutate(quality = case_when(is.na(points) ~ 'missing',
                             points > 215 & assists > 100 ~ 'great',
                             points > 215 & assists > 150 ~ 'good',
                             TRUE ~ 'average' ))
   player position points assists quality
1      A       R1    102     405 average
2      B       R2    105     407 average
3      C       R3    219     527   great
4      D       R4    322     412   great
5      E       R5    232     211   great
6      F     <NA>     NA      NA missing

The post Create new variables from existing variables in R appeared first on Data Science Tutorials

To leave a comment for the author, please follow the link and comment on their blog: Data Science Tutorials.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)