I like to follow good practice when I program. I want my code to be readable, properly indented, modular and re-usable. And I want my variables to have descriptive names. There’s nothing that I
hate moderately dislike more than arbitrary abbreviations and inconsistent style. I have to say that R is not the best example when it comes to style. Even base functions often have weird names, and their arguments are either camelCased, period.separated, abbrvtd, all willy-nilly with no consistency, as if saving a few keystrokes was so important. It’s like learning PHP again. But the weirdest thing I’ve come across yet is the possibility of using a partial name for an argument (e.g. co for collapse). I’m at lost to find a rationale for this; it seems designed to engineer impenetrable code.
Good and consistent style helps you code better. Long, descriptive names make your code more readable and tab-completion will save you those precious keystrokes. So go for it. That’s not to say it can’t a problem sometimes. For example, this week I was adding some bells and whistles to a function I’d written. One statement involved subsetting a data frame on a hard coded value, like subset(result,association==”firstKind”)
A new argument for my function was, you guessed it, association. You see where it’s heading to; the statement turned to:
And of course it failed; the condition is always true, because both instances of association are interpreted to be referring the column name of the data frame. So all rows are selected, whatever the argument is.
How to get out of this? Well, one could change the argument or the column name but I was already using them all over the function and elsewhere, and didn’t fancy tinkering the code too much at that point. Besides, I was reluctant to rename them in the first place, for reasons that should be obvious now. So what I’ve done is to read up the documentation on scoping, which is what the problem is, and came up with this:
The association on the right hand side is now correctly interpreted as the function’s argument. It’s a bit clumsy, but I get to use my beloved descriptive variable names and don’t need to go off on a replacement frenzy and its associated new bugs.
If you don’t see what I mean, here is some code I left on a related stackOverflow thread:
1 2 3 4 5 6 7 8 9 10
x<-data.frame( start=sample(3,20,replace=TRUE), someValue=runif(20)) e<-environment() start<-3 cat("\nDefaut scope:") print(subset(x,start==start)) # all entries, as start==start is evaluated to TRUE cat("\nSpecific environment:") print(subset(x,start==get('start',e))) # second start is replaced by its value in former environment.
However, bad practice has its perks and can be a lot of fun! I recently came across this very addictive online game: anarchy golf. There are more than 500 programming tasks to choose from. Each of them is very easy to code, like printing out the Fibonacci sequence, or just ‘Hello World’ but that’s not where the challenge is.
As the name suggests, the real challenge is to do it in as few bytes as possible! And that’s where obscure and horribly nested code come in handy. Variables names have to be 1 letter max., if you use variables at all, that is.
My current records are:
- 77 bytes for the smileys triangle
- 76 bytes for FizzBuzz (thanks to tomp who told me about writeLines).
R-bloggers and readers, I challenge you to beat that!
Careful with the possible invisible line breaks at the end of your file. This bit of perl will get rid of it if your editor insists on adding it: perl -pe ‘chomp if eof’ . And no cheating! Your code must be pure R, so no using system() please.
It’s a great and terribly addictive game, and teaches you some of the weirdest and more obscure R commands and shortcuts. And partial matching suddenly become useful and even recommended.