In “Abstract Data Types and the Uniform Referent Principle I: why Douglas T. Ross would hate nest(), unnest(), gather() and spread()”, I explained why the notation for interfacing to a data structure should be independent of that structure’s representation.
R programmers honour this principle in the same way that bricks hang in the sky.
All published R code that operates on data frames uses column
names. Sometimes these follow the
sometimes the data frame is implicit via
or similar. In the Tidyverse, the column names will often
be part of a
mutate(), the data frame being
piped through a sequence of
And this is dreadful software engineering.
Why? Look at the tables below.
They represent four different ways of storing my income data.
Abstractly, the data is the same in each case, and if you’re
you will easily see how to transform one table into
any of the others. But the tables are implemented in very different ways. If you access their elements with
$ or an equivalent, and you then change the implementation, you have to rewrite all those accesses. Which is dreadful software engineering.