Consistent naming conventions in R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Naming conventions in R are famously anarchic, with no clear winner and multiple conventions in use simultaneously in the same package. This has been written about before, in a lucid article in the R Journal, a detailed exploration of names in R source code hosted on CRAN and general discussion on stackoverflow.
Basically, there are 5 naming conventions to choose from:
- alllowercase: e.g.
adjustcolor
- period.separated: e.g.
plot.new
- underscore_separated: e.g.
numeric_version
- lowerCamelCase: e.g.
addTaskCallback
- UpperCamelCase: e.g.
SignatureMethod
There are clear advantages to choosing one naming convention and sticking to it, regardless which one it is:
“Use common sense and BE CONSISTENT”
The Google Style Guide is ironically written in a rather inconsistent way (mixing capitals with lowercase in a single sentence surely breaks their own rule!)
But which one to choose? Read below to find out about the thorny issue of naming conventions in R, based on a tutorial on geo-spatial data handling in R.
Naming convention chaos
I recently encountered this question when I looked at the CRAN hosted version of the tutorial I co-authored ‘Introduction to visualising spatial data in R’. To my dismay, this document was littered with inconsistencies: here are just a few of the object names used, breaking almost every naming convention:
Partic_Per
: This variable is trying to be simultaneously UpperCamelBack and underscore_separated: a new naming convention I’d like to coin Upper_Underscore_Separated (joke). Here’s another example:Spatial_DistrictName
These styles should not be mixed according to Hadley Wickham and Colin Gillespie.sport.wgs84
: An example of period.separationcrimeDat$MajorText
: lowerCamelBack and UpperCamelBack in the same object!ons_label
: a rare example of a consistent use of a naming convention, although this was in a variable name, not an object.
Does any of your code look like this? If so I suggest sorting it out. Ironically, we had a section on typographic conventions in the error strewn document. This states that:
“it is a good idea to get into the habit of consistent and clear writing in any language, and R is no exception”.
It was time to follow our own advice!
A trigger to remedy chaotic code
The tutorial was used as the basis for a workshop delivered at the Free and Open Source Software for Geo-spatial (FOSS4G) conference in Bremen. The event is affiliated with the The Open Source Geospatial Foundation (OSGeo), who are big advocates of consistency and standards. With many experienced programmers at the event, it was the perfect opportunity to update the tutorial on the project’s github repository.
Which naming convention?
We decided to use the underscore_separated naming convention. Why? It wasn’t because we love typing underscores (which can cause problems in some contexts), but because of more fundamental issues with the other options:
- alllowercase names are difficult to read, especially for non-native readers.
- period.separated names are confusing for users of Python and other languages in which dots are meaningful.
- UpperCamelBack is ugly and requires excessive use of the shift button.
There are also a couple of reasons why we positively like underscores:
- Underscores are fast to read: 10% to 20% faster than camelBack, which is especially confusing to non-native English readers, according to one article.
- Underscores are recommended by some prominent R users, including Hadley Wickham, Colin Gillespie and Andrew Gellman.
Implementing a consistent coding convention
After overcoming the mental inertia to decide on a new naming convention, actually implementing it should be the easy part. A series of regex commands could help, including the following (the ‘Regex’ tickbox must be enabled if you’re searching in RStudio):
[a-z]\.[a-z] # will search for dots between lowercase chars (period.separation) [a-z][A-Z] # find camelBack code
Unfortunately, these commands will also find many R commands that use these naming convention, so just re-reading the code may be just as fast.
The below image shows the github diff of a typical change as part of a renaming strategy. Note in this example that not only are we implementing a consistent naming convention, we also added a new comment in this commit, improving the code’s ‘understandability’. Implementing a naming convention can be part of a wider campaign to improve your R projects. This could include adding comments, removing redundant information from large projects and reformatting code, perhaps using the formatR package.
Conclusion
It is important to think about style in writing any languages, especially if your code will be read by others:
“What could help might be to raise awareness in the R community about naming conventions; writers of books and tutorials on R could make a difference here by treating naming conventions when introducing the R language.”
In conclusion, it is lazy and irresponsible to write and maintain messy code that is difficult to read. By contrast, consistent, clear and well-commented code will help you and others use your code and ensure its longevity. Adoption of a clearly defined naming convention such as the underscore_separation adopted in our tutorial can be an easy step one can take now towards this aim.
The only question that remains is which naming convention WiLL.U_uSe
!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.