How To Select Multiple Columns Using Grep & R

[This article was first published on Data Science Using R – FinderDing, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Why you need to be using Grep when programming with R.

There’s a reason that grep is included in most if not all programming language to this day 44 years later from creation. It’s useful and simple to use. Below is an example of using grep to make selecting multiple columns in R simple and easy to read.

The dataset below has the following column names.

names(data) # Column Names
 [1] "fips"                 "state"                "county"               "metro_area"          
 [5] "population"           "med_hh_income"        "poverty_rate"         "population_lowaccess"
 [9] "lowincome_lowaccess"  "no_vehicle_lowaccess" "s_grocery"            "s_supermarket"       
[13] "s_convenience"        "s_specialty"          "s_farmers_market"     "r_fastfood"          
[17] "r_full_service"      

How can we select only the columns we need to work with?

  • metro_area
  • med_hh_income
  • poverty_rate
  • population_lowaccess
  • lowincome_lowaccess
  • no_vehicle_lowaccess
  • s_grocery
  • s_supermarket
  • s_convenience
  • s_specialty
  • s_farmers_market
  • r_fastfood
  • r_full_service

We can tell R exactly by listing each column as below

data[c("metro_area","med_hh_income", "poverty_rate", "population_lowaccess", "lowincome_lowaccess", "no_vehicle_lowaccess","s_grocery","s_supermarket","s_convenience","s_specialty","s_farmers_market", "r_fastfood", "r_full_service")]

OR

We can tell R where each column we want is.

data[c(4,6,7:17)]

First, writing out each individual column is time consuming and chances are you’re going to make a typo (I did when writing it). Second option we have to first figure out where the columns are located to then tell R. Well looking at the columns we are trying to access vs the others theirs a specific difference. All these columns have a “_” located in there name, and we can use regular expressions (grep) to select these.

data[grep("_", names(data))])

FYI… to get the column locations you can actually use…

grep("_", names(data))
[1]  4  6  7  8  9 10 11 12 13 14 15 16 17

You will rarely have a regular expression as easy at “_” to select multiple columns, a very useful resource to learn and practice is https://regexr.com

Data was obtained from https://www.ers.usda.gov/data-products/food-access-research-atlas/download-the-data/

The post How To Select Multiple Columns Using Grep & R appeared first on FinderDing.

To leave a comment for the author, please follow the link and comment on their blog: Data Science Using R – FinderDing.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)