Tidyverse Tips

[This article was first published on R – Predictive Hacks, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I have found the following commands quite useful during the EDA part of any Data Science project. We will work with the tidyverse package where we will actually need the dplyr and the ggplot2 only and with the irisdataset.

select_if | rename_if

The select_if function belongs to dply and is very useful where we want to choose some columns based on some conditions. We can also add a function that applies to column names.

Example: Let’s say that I want to choose only the numeric variables and to add the prefix “numeric_” to their column names.

library(tidyverse)

iris%>%select_if(is.numeric,  list(~ paste0("numeric_", .)))%>%head()
 

Output:

Tidyverse Tips 1

Notice that we can also use the rename_if in the same way. An important note is that the rename_if(), rename_at(), and rename_all() have been superseded by rename_with(). The matching select statements have been superseded by the combination of a select() + rename_with().

These functions were superseded because mutate_if() and friends were superseded by across(). select_if() and rename_if() already use tidy selection so they can’t be replaced by across() and instead we need a new function.


everything

In many Data Science projects, we want one particular column (usually the dependent variable y) to appear first or last in the dataset. We can achieve this using the everything() from dplyr package.

Example: Let’s say that I want the column Species to appear first in my dataset.

mydataset<-iris%>%select(Species, everything())
mydataset%>%head()
 
Tidyverse Tips 2

Example: Let’s say that I want the column Species to appear last in my dataset.

This is a little bit tricky. Have a look below at how we can do it. We will work with the mydataset where the Species column appears first and we will remove it to the last column.

mydataset%>%select(-Species, everything())%>%head()
 
Tidyverse Tips 3

relocate

The relocate() is a new addition in dplyr 1.0.0. You can specify exactly where to put the columns with .before or .after

Example: Let’s say that I want the Petal.Width column to appear next to Sepal.Width

iris%>%relocate(Petal.Width, .after=Sepal.Width)%>%head()
Tidyverse Tips 4

Notice that we can also set to appear after the last column.

Example: Let’s say that I want the Petal.Width to be the last column

iris%>%relocate(Petal.Width, .after=last_col())%>%head()
 
Tidyverse Tips 5

You can find more info in the tidyverse documentation


pull

When we work with data frames and we select a single column, sometimes we the output to be as.vector. We can achieve this with the pull() which is part of dplyr.

Example: Let’s say that I want to run a t.test in the Sepal.Length for setosa versus virginica. Note the the t.test function expects numeric vectors.

setosa_sepal_length<-iris%>%filter(Species=='setosa')%>%select(Sepal.Length)%>%pull()
virginica_sepal_length<-iris%>%filter(Species=='virginica')%>%select(Sepal.Length)%>%pull()

t.test(setosa_sepal_length,virginica_sepal_length)
 
Tidyverse Tips 6


reorder

When you work with ggplot2 sometimes is frustrating when you have to reorder the factors based on some conditions. Let’s say that we want to show the boxplot of the Sepal.Width by Species.

iris%>%ggplot(aes(x=Species, y=Sepal.Width))+geom_boxplot()
 
Tidyverse Tips 7

Example: Let’s assume that we want to reorder the boxplot based on the Species’ median.

We can do that easily with the reorder() from the stats package.

iris%>%ggplot(aes(x=reorder(Species,Sepal.Width, FUN = median), y=Sepal.Width))+geom_boxplot()+xlab("Species")
 
Tidyverse Tips 8

To leave a comment for the author, please follow the link and comment on their blog: R – Predictive Hacks.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)