# A plea for consistent style!

December 22, 2010
By

(This article was first published on SAS and R, and kindly contributed to R-bloggers)

As we get close to the end of the year, it’s time to look back over the past year and think of resolutions for 2011 and beyond. One that’s often on my mind relates to ways to structure my code to make it clearer to others (as well as to myself when I look back upon it months later).

Style guides are common in many programming languages, and are often purported to increase the readability and legibility of code, as well as minimize errors. The Wikipedia page on this topic describes the importance of indentation, spacing, alignment, and other formatting conventions.

Many stylistic conventions are appropriate for statistical code written in SAS and R, and can help to make code clearer and easier to comprehend. Consider the difference between:

`ds=read.csv("http://www.math.smith.edu/r/data/help.csv");attach(ds)fOo=ks.test(age[female==1],age[female==0],data=ds)plotdens=function(x,y,mytitle, mylab){densx = density(x)densy = density(y);plot(densx,main=mytitle,lwd=3,xlab=mylab, bty="l");lines(densy,lty=2,col=2,lwd=3);xvals=c(densx\$x, rev(densy\$x));yvals=c(densx\$y,rev(densy\$y));polygon(xvals, yvals,col="gray")};mytitle=paste("Test of ages: D=",round(fOo\$statistic,3)," p=",round(fOo\$p.value,2),sep="");plotdens(age[female==1],age[female==0],mytitle=mytitle,mylab="age (in years)")legend(50,.05,legend=c("Women","Men"),col=1:2,lty=1:2,lwd=2)`

and

`# code example from the Using R for Data Management, Statistical# Analysis and Graphics book# Nicholas Horton, Smith College    December 21, 2010#ds = read.csv("http://www.math.smith.edu/r/data/help.csv")attach(ds)# fit KS test and save object containing p-valueksres = ks.test(age[female==1], age[female==0], data=ds)# define function to plot two densities on the same graphplotdens = function(x, y, mytitle, mylab) {  densx = density(x)  densy = density(y)  plot(densx, main=mytitle, lwd=3, xlab=mylab, bty="l")  lines(densy, lty=2, col=2, lwd=3)  xvals = c(densx\$x, rev(densy\$x))  yvals = c(densx\$y, rev(densy\$y))  polygon(xvals, yvals, col="gray")}# craft specialized title containing statistic and p-valuemytitle = paste("Test of ages: D=",   round(ksres\$statistic,3),  " p=", round(ksres\$p.value, 2),   sep="")plotdens(age[female==1], age[female==0],   mytitle=mytitle, mylab="age (in years)")legend(50, .05, legend=c("Women", "Men"), col=1:2, lty=1:2,   lwd=2)`

While the first example has the advantage of using considerably fewer lines, it suffers dramatically from readability. The use of appropriate indentation, white space, spacing and comments help the analyst when debugging as well as fostering easier reuse in the future. In settings where code review is undertaken, sharing a set of common standards is eminently sensible.

SAS

A specific but somewhat cursory style manual for SAS can be found at the SAS community Style guide for writing and polishing programs. I like the start of this guide, though it is incomplete at present. Other useful words of wisdom can be found here and here.

R

Google’s R Style Guide is chock full of tips and guidelines to make R code easier to read, share and verify. Another source of ideas is Henrik Bengtsson’s draft R coding conventions. While one can quibble about some of the specific suggestions, overall, the effect of adherence to such a style guide is code that is easier to understand and less likely to hide errors.

Some coders are fundamentalists in insisting on “the correct” style. In general, however, it is more important to develop a sensible, interpretable, and coherent style of your own than to adhere to styles that you find awkward, whatever their provenance. The links above provide some common sense tips that can help improve productivity and make you a better analyst. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

Tags: , , , , ,