A plea for consistent style!

December 22, 2010
By

(This article was first published on SAS and R, and kindly contributed to R-bloggers)

As we get close to the end of the year, it's time to look back over the past year and think of resolutions for 2011 and beyond. One that's often on my mind relates to ways to structure my code to make it clearer to others (as well as to myself when I look back upon it months later).

Style guides are common in many programming languages, and are often purported to increase the readability and legibility of code, as well as minimize errors. The Wikipedia page on this topic describes the importance of indentation, spacing, alignment, and other formatting conventions.

Many stylistic conventions are appropriate for statistical code written in SAS and R, and can help to make code clearer and easier to comprehend. Consider the difference between:

ds=read.csv("http://www.math.smith.edu/r/data/help.csv");attach(ds)
fOo=ks.test(age[female==1],age[female==0],data=ds)
plotdens=function(x,y,mytitle, mylab){densx = density(x)
densy = density(y);plot(densx,main=mytitle,lwd=3,xlab=mylab,
bty="l");lines(densy,lty=2,col=2,lwd=3);xvals=c(densx$x,
rev(densy$x));yvals=c(densx$y,rev(densy$y));polygon(xvals,
yvals,col="gray")};mytitle=paste("Test of ages: D=",round(fOo$statistic,3),
" p=",round(fOo$p.value,2),sep="");plotdens(age[female==1],
age[female==0],mytitle=mytitle,mylab="age (in years)")
legend(50,.05,legend=c("Women","Men"),col=1:2,lty=1:2,lwd=2)

and

# code example from the Using R for Data Management, Statistical
# Analysis and Graphics book
# Nicholas Horton, Smith College December 21, 2010
#
ds = read.csv("http://www.math.smith.edu/r/data/help.csv")
attach(ds)

# fit KS test and save object containing p-value
ksres = ks.test(age[female==1], age[female==0], data=ds)

# define function to plot two densities on the same graph
plotdens = function(x, y, mytitle, mylab) {
densx = density(x)
densy = density(y)
plot(densx, main=mytitle, lwd=3, xlab=mylab, bty="l")
lines(densy, lty=2, col=2, lwd=3)
xvals = c(densx$x, rev(densy$x))
yvals = c(densx$y, rev(densy$y))
polygon(xvals, yvals, col="gray")
}

# craft specialized title containing statistic and p-value
mytitle = paste("Test of ages: D=",
round(ksres$statistic,3),
" p=", round(ksres$p.value, 2),
sep="")

plotdens(age[female==1], age[female==0],
mytitle=mytitle, mylab="age (in years)")

legend(50, .05, legend=c("Women", "Men"), col=1:2, lty=1:2,
lwd=2)

While the first example has the advantage of using considerably fewer lines, it suffers dramatically from readability. The use of appropriate indentation, white space, spacing and comments help the analyst when debugging as well as fostering easier reuse in the future. In settings where code review is undertaken, sharing a set of common standards is eminently sensible.

SAS

A specific but somewhat cursory style manual for SAS can be found at the SAS community Style guide for writing and polishing programs. I like the start of this guide, though it is incomplete at present. Other useful words of wisdom can be found here and here.

R

Google's R Style Guide is chock full of tips and guidelines to make R code easier to read, share and verify. Another source of ideas is Henrik Bengtsson's draft R coding conventions. While one can quibble about some of the specific suggestions, overall, the effect of adherence to such a style guide is code that is easier to understand and less likely to hide errors.

Some coders are fundamentalists in insisting on "the correct" style. In general, however, it is more important to develop a sensible, interpretable, and coherent style of your own than to adhere to styles that you find awkward, whatever their provenance. The links above provide some common sense tips that can help improve productivity and make you a better analyst.

To leave a comment for the author, please follow the link and comment on his blog: SAS and R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , , , ,

Comments are closed.