R is not one of those languages where there is only one way of doing something, the language is blessed/cursed with lots of ways of doing the same thing.
Teaching R to professional developers is easy in the sense that their fluency with other languages will enable them to soak up this small language like a sponge, on the day they learn it. The problems will start a few days after they have been programming in another language and go back to using R; what they learned about R will have become entangled in their general language knowledge and they will be reduced to trial and error, to figure out how things work in R (a common problem I often have with languages I have not used in a while, is remembering whether the if-statement has a
then keyword or not).
My Empirical software engineering book uses R and is aimed at professional developers; I have been trying to create a subset of R specifically for professional developers. The aims of this subset are:
- behave like other languages the developer is likely to know,
- not require knowing which way round the convention is in R, e.g., are 2-D arrays indexed in row-column or column-row order,
- reduces the likelihood that developers will play with the language (there is a subset of developers who enjoy exploring the nooks and crannies of a language, creating completely unmaintainable code in the process).
I am running a workshop based on the book in a few weeks and plan to teach them R in 20 minutes (the library will take a somewhat longer).
Here are some of the constructs in my subset:
subsetto extract rows meeting some condition. Indexing requires remembering to do it in row-column order and weird things happen when commas accidentally get omitted.
- Always call
read.csvwith the argument
as.is=TRUE. Computers now have lots of memory and this factor nonsense needs to be banished to history.
- Try not to use for loops. This will probably contain array/data.frame indexing, which provide ample opportunities for making mistakes, use the
*plyfunctions (which have the added advantage of causing code to die quickly and horribly when a mistake is made, making it easier to track down problems).
headto remove the last
Nelements from an object, e.g.,
head(x, -1)returns x with the last element removed. Indexing with the length minus one is a disaster waiting to happen.
It’s a shame that R does not have any mechanism for declaring variables. Experience with other languages has shown that requiring variables to be declared before use catches lots of coding errors (this could be an optional feature so that those who want their ‘freedom’ can have it).
We now know that support for case-sensitive identifiers is a language design flaw, but many in my audience will not have used a language that behaves like this and I have no idea how to help them out.
There are languages in common use whose array bounds start at one. I will introduce R as a member of this club. Not much I can do to help out here, except the general suggestion not to do array indexing.
Suggestions based on reader’s experiences welcome.