Variable Names: Camel Case to Underscore Delimited

November 19, 2017
By

(This article was first published on R on datawookie, and kindly contributed to R-bloggers)

A project I’m working on has a bunch of different data sources. Some of them have column names in Camel Case. Others are underscore delimited. My OCD rebels at this disarray and demands either one or the other.

If it were just a few columns and I was only going to have to do this once, then I’d probably just quickly do it by hand. But there are many columns and it’s very likely that there’ll be more data in the future and the process will need to be repeated.

Seems like something that should be easy to automate.

I’m sure that there are a variety of ways to attack this problem, but this is a quick hack that worked for me. It relies on a regular expression negative lookbehind to prevent matching to the first letter if it’s a capital.

data %>% setNames(names(.) %>% str_replace_all("(?, "_\\1") %>% str_to_lower())

My first attempt matched "(? but I changed this to "(? in order to deal with column names like GroupID.

To leave a comment for the author, please follow the link and comment on their blog: R on datawookie.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)