Abstract Data Types and the Uniform Referent Principle II: why Douglas T. Ross would hate nest(), unnest(), gather() and spread()

October 1, 2017
By

(This article was first published on R – Jocelyn Ireson-Paine's Blog, and kindly contributed to R-bloggers)

In “Abstract Data Types and the Uniform Referent Principle I: why Douglas T. Ross would hate nest(), unnest(), gather() and spread()”, I explained why the notation for interfacing to a data structure should be independent of that structure’s representation.
R programmers honour this principle in the same way that bricks hang in the sky.
All published R code that operates on data frames uses column
names. Sometimes these follow the $ operator;
sometimes the data frame is implicit via attach()
or similar. In the Tidyverse, the column names will often
be part of a mutate(), the data frame being
piped through a sequence of %>% operators.
And this is dreadful software engineering.

Why? Look at the tables below.
They represent four different ways of storing my income data.

Person

Income_Type

Income_Value

Alice

Wages

37000

Alice

Bonuses

0

Alice

Benefits

0

Bob

Wages

14000

Bob

Bonuses

1000

Bob

Benefits

6000
Person

Income_Wages

Income_Bonuses

Income_Benefits

Alice

37000

0

0

Bob

14000

1000

6000
Person

Income

Alice

Type

Value

Wages

37000

Bonuses

0

Benefits

0
Bob

Type

Value

Wages

14000

Bonuses

1000

Benefits

6000
Person

Income

Alice

Wages

Bonuses

Benefits

37000

0

0
Bob

Wages

Bonuses

Benefits

14000

1000

6000

Abstractly, the data is the same in each case, and if you’re
familiar with nest(),
unnest(), gather() and spread(),
you will easily see how to transform one table into
any of the others. But the tables are implemented in very different ways. If you access their elements with $ or an equivalent, and you then change the implementation, you have to rewrite all those accesses. Which is dreadful software engineering.

Cartoon of experimenter peering into innards of complicated piece of machinery. His colleague is holding a plug coming out of it labelled INTERFACE: 'GET' 'INCOME' 'WAGES' and saying 'Don't worry about how it works. It's the interface that's important.'

To leave a comment for the author, please follow the link and comment on their blog: R – Jocelyn Ireson-Paine's Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)