Variable Names: Camel Case to Underscore Delimited

[This article was first published on R on datawookie, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A project I’m working on has a bunch of different data sources. Some of them have column names in Camel Case. Others are underscore delimited. My OCD rebels at this disarray and demands either one or the other.

If it were just a few columns and I was only going to have to do this once, then I’d probably just quickly do it by hand. But there are many columns and it’s very likely that there’ll be more data in the future and the process will need to be repeated.

Seems like something that should be easy to automate.

I’m sure that there are a variety of ways to attack this problem, but this is a quick hack that worked for me. It relies on a regular expression negative lookbehind to prevent matching to the first letter if it’s a capital.

data %>% setNames(names(.) %>% str_replace_all("(?<!^)([A-Z]+)", "_\\1") %>% str_to_lower())

My first attempt matched "(?<!^)([A-Z])" but I changed this to "(?<!^)([A-Z]+)" in order to deal with column names like GroupID.

To leave a comment for the author, please follow the link and comment on their blog: R on datawookie.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)