Changing individual column names

July 11, 2018
By

(This article was first published on woodpeckR, and kindly contributed to R-bloggers)

Problem

How do I change the name of just one column in a data frame?

Context

This is a simple one that keeps coming up. Sometimes, whoever put together my data decided to capitalize the first letter of some column names and not others. Sometimes I’ve merged several data frames together and I need to distinguish the columns from each other.

Say my data frame is p8_0 and I’d like to change the column Area to area.

In the past, I’ve done this in one of two ways. Either I change all of the column names at once (if all of them need to be changed), or I use numerical column indexing. The latter makes a lot more sense if I have a lot of columns to deal with, but it means I have to know the number of the column whose name I have to change.

To find this out, I first have to look at all of the column names. Okay, no problem.

# See column names and numerical indices
names(p8_0)
[1] "FID" "Join_Count" "TARGET_FID" 
 [4] "Field1" "barcode" "stratum" 
 [7] "lcode" "sdate" "utm_e" 
 [10] "utm_n" "snag" "OBJECTID" 
 [13] "uniq_id" "aa_num" "AQUA_CODE" 
 [16] "AQUA_DESC" "pool" "Area" 
 [19] "Perimeter" "bath_pct" "max_depth" 
 [22] "avg_depth" "sd_depth" "tot_vol" 
 [25] "area_gt50" "area_gt100" "area_gt200" 
 [28] "area_gt300" "avg_fetch" "shoreline_density_index"
 [31] "econ" "sill" "min_rm" 
 [34] "max_rm" "len_met" "len_prm_lotic" 
 [37] "pct_prm_lotic" "num_lotic_outl" "len_prm_lentic" 
 [40] "pct_prm_lentic" "num_lentic_outl" "pct_aqveg" 
 [43] "pct_opwat" "len_terr" "pct_terr" 
 [46] "pct_aq" "len_wetf" "pct_prm_wetf" 
 [49] "pct_terr_shore_wetf" "len_wd" "wdl_p_m2" 
 [52] "num_wd" "scour_wd" "psco_wd" 
 [55] "len_revln" "rev_p_m2" "num_rev" 
 [58] "pct_terr_shore_rev" "pct_prm_rev" "area_tpi1" 
 [61] "pct_tpi1" "area_tpi2" "pct_tpi2" 
 [64] "area_tpi3" "pct_tpi3" "area_tpi4" 
 [67] "pct_tpi4" "sinuosity" "year_phot" 
 [70] "NEAR_TERR_FID" "NEAR_TERR_DIST" "NEAR_TERR_CLASS_31" 
 [73] "NEAR_TERR_CLASS_15" "NEAR_TERR_CLASS_7" "NEAR_TERR_CLASS_31_N" 
 [76] "NEAR_TERR_CLASS_15_N" "NEAR_TERR_CLASS_7_N" "NEAR_TERR_HEIGHT_N" 
 [79] "NEAR_FOREST_FID" "NEAR_FOREST_DIST" "NEAR_FOREST_CLASS_31" 
 [82] "NEAR_FOREST_CLASS_15" "NEAR_FOREST_CLASS_7" "NEAR_FOREST_CLASS_31_N" 
 [85] "NEAR_FOREST_CLASS_15_N" "NEAR_FOREST_CLASS_7_N" "NEAR_FOREST_HEIGHT_N" 
 [88] "year.p" "depth.p" "current.p" 
 [91] "gear.p" "stageht.p" "substrt.p" 
 [94] "wingdike.p" "riprap.p" "trib.p" 
 [97] "snagyn" "area_le50" "area_le100" 
[100] "area_le200" "area_le300" "pct_area_le100" 
[103] "pct_area_le50" "pct_area_le200" "pct_area_le300" 
[106] "stratum_name"

Okay, yes problem.

It’s not that hard to see that Area is the 18th column. But there are a bunch of columns that start with NEAR_TERR_ and NEAR_FOREST_ that would be easy to confuse. And what if I later modify my data cleaning script, insert new columns, and mess up the numerical indexing?

Solution

The first solution I came up with is simple but pretty clunky. At least it solves the problem of numerical indices getting misaligned. And if you mistype the column name or try to change the name of a column that doesn’t exist, it doesn’t throw an error.

# Change "Area" column name to "area"
names(p8_0)[names(p8_0) == "Area"] <- "area"

This works well, but it gets annoying if you have more than one column name to change. Every column requires typing names(p8_0) twice, and that adds up to a lot of lines of code.

To no one’s surprise, dplyr has a more elegant solution, using the rename function.

# Load dplyr library(dplyr) # Rename variable (new name first) p8_0 %>% rename(area = Area)

A quick note on rename: somewhat counterintuitively, the new name comes before the old name. General example:

# General syntax for rename df %>% rename(newname = oldname)

rename saves a whole bunch of keystrokes and also scales very well to multiple columns.

Let’s say I wanted to change Area and Perimeter to area and perimeter, respectively, and I also wanted to change the rather clunky shoreline_density_index to sdi. And while we’re at it, snagyn, a factor variable that indicates whether a large piece of wood was present at the site (“yes” or “no”), might be clearer as snag_yn and sinuosity could be shortened to sinu

Without dplyr:

# Change each column name individually
names(p8_0)[names(p8_0) == "Area"] <- "area"
names(p8_0)[names(p8_0) == "Perimeter"] <- "perimeter"
names(p8_0)[names(p8_0) == "shoreline_density_index"] <- "sdi"
names(p8_0)[names(p8_0) == "snagyn"] <- "snag_yn"
names(p8_0)[names(p8_0) == "sinuosity"] <- "sinu"

With dplyr:

# Change any column names you want to, all at once
p8_0 %>% rename(area = Area, 
                perimeter = Perimeter,
                sdi = shoreline_density_index, 
                snag_yn = snagyn,
                sinu = sinuosity)

So pretty. As an added bonus, you’re saved from both quotation marks and the dreaded double equals sign (!!!).

In case anyone was counting, that’s 102 characters vs. 238 (spaces not included). 116 if you include loading dplyr, but you already had it loaded because you’re using it throughout your code, of course.

Outcome

Now I can rename only the columns I want, by name instead of numerical index, without fear of having to change everything if I insert or delete some columns later on.

I’m sure I’ll get used to it eventually, but putting the new column name before the old one in the rename function really throws me. I’d prefer a “from –> to” syntax. If you’re like me and you’re willing to make your code slightly longer, there’s also ​dplyr::recode

# Change column names with recode library(dplyr) names(p8_0) % recode(Area = "area", Perimeter = "perimeter", shoreline_density_index = "sdi", snagyn = "snag_yn", sinuosity = "sinu")

132 characters is still a heck of a lot better than 238, and it might be worth it for more intuitive and memorable syntax.

Resources

More thoughts on changing individual variable names, including a couple other packages if you feel like trying them:
https://stackoverflow.com/questions/7531868/how-to-rename-a-single-column-in-a-data-frame

Documentation for the recode function: https://dplyr.tidyverse.org/reference/recode.html

To leave a comment for the author, please follow the link and comment on their blog: woodpeckR.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)