Abstract Data Types and the Uniform Referent Principle I: why Douglas T. Ross would hate nest(), unnest(), gather() and spread()

September 27, 2017
By

(This article was first published on R – Jocelyn Ireson-Paine's Blog, and kindly contributed to R-bloggers)

“What’s the Uniform Referent Principle?” my colleague asked me on reading my last post. I think
I first came across it in Jean Sammet’s
famous book Programming Languages: History and Fundamentals. In
a description of Douglas Ross’s AED-0 language, she pointed
out a feature that she thought
particularly noteworthy: the notation for getting information out of a piece
of data is the same regardless of how the data is stored. Or as Ross puts it: There must be a single, uniform method of referencing operands regardless of their detailed implementation. He
considered this so
important that he gave it a name: the Uniform Referent Principle. Unfortunately, it’s a principle
universally ignored by R programmers. I’m going to explain
why that is bad, and how it affects my coding of the
tax and benefit calculations in our economic model.

Sammet doesn’t actually name the principle, but Ross himself does, as is
evident from the title of his paper
“Uniform Referents: An Essential Property for a Software Engineering Language” by
Douglas T. Ross, in Software Engineering:
Proceedings of the Third Symposium on Computer and Information Sciences held in Miami Beach, Florida, December, 1969
(Volume 1, edited by Julius Tou).

It’s worth asking why it concerned him. His answer is as relevant now as it was in 1969. Here’s the start to his introduction:

The term software engineering is new and has yet to achieve a well-defined meaning.
Most present-day software has not been engineered at all, but instead is
a product of the individual experience of its creators and primarily ad hoc
choices among various alternatives. The effects of this unstructured approach to software are quite obvious in terms of poor performance, missed schedules, and uncontrolled costs.

Ross continues by saying that although software engineering is one of the most challenging problems facing humanity, we are beginning to systematise it and to recognise the fundamental issues. The most fundamental of these involves:

[…] one basic criterion which a general-purpose
programming language must meet if it is to be used as the expressive vehicle for
software engineering activities regardless of the style of practice of these activities:
There must be a single, uniform method of referencing operands regardless of their detailed implementation.

Why is this so important? Ross relates it to his ideas on how to design programming languages:

Programming language features for software engineering must be carefully
selected; not any old programming language features will do. An
unstructured blur of assembly language depending in turn upon an
ad hoc collection of machine hardware features which just happen
to have been brought together for some particular computer design
has a low probability of matching the real world in any profound way.
Similarly, most “high-level” languages invariably will be either not complete or
not concise enough to serve as a basis for software engineering, for
they have a similar ad hoc ancestry with roots in scientific
computation or business data processing areas, which omit many
important aspects of general software construction.

Heaven forbid that R’s ancestry be deemed in any way ad hoc. But let Ross
continue:

Careful
inspection and experimentation with various software engineering
activities discloses a few fundamental features that somehow must be
present in a software engineering language. These features can be
informally derived directly from high-level considerations of the
software engineering design process itself. The purpose of this
paper is to indicate such a derivation of one basic principle
without attempting an exhaustive enumeration of the consequences.

Ross then goes on to those promised high-level considerations. Quoting an excerpt from his
report “Computer-Aided Design: A Statement of Objectives” (Tech.
Mem. 8436-TM-4, Defense Documentation Center No. AD252060, M.I.T. Electron.
Systems Lab., 1960), he writes that:

The manner of stating problems is also of primary importance. You must be able to
state problems from outside in, not
inside out.
The normal practice of being explicit about a problem is to build
a description of the solution procedure in small steps, i.e. to
code the problem…. Stating a problem step by step from the
inside out in this manner is satisfactory if the problem is well understood to begin with.
But there is an alternative procedure which is frequently used among people,
but is virtually non-existent as regards computers. In this alternate procedure,
which we call stating the problem from the outside in, we state the problem originally
in general terms and then explain further and in more and more detail what we mean….
It is quite apparent that this procedure of stating a problem from the
outside in by means of further and further refinements is the only feasible way to
attack grand problems.

This way of stating problems is also called stepwise
refinement. One of the computer scientists who popularised the name
was Niklaus Wirth, the designer of Pascal, in his paper

“Program development by stepwise refinement”
(Communications of the ACM,
April 1971,
Volume 14,
Number 4).
In it, he writes:

A guideline in the process of stepwise refinement should be the principle to decompose decisions as much as possible, to untangle aspects which are only seemingly interdependent, and to defer those decisions which concern details of representation as long as possible. This will result in programs which are easier to adapt to different environments (languages and computers), where different representations may be required.

Why is it important to defer decisions which concern details of representation? Because undoing these
decisions is
costly. One has to rewrite code, retest it, and redocument it. The modern
technique for minimising this cost is to use
abstract data types. To
quote the beginning of the Wikipedia page:

In computer science, an abstract data type (ADT) is a mathematical model for data types, where a data type is defined by its behavior (semantics) from the point of view of a user of the data, specifically in terms of possible values, possible operations on data of this type, and the behavior of these operations. This contrasts with data structures, which are concrete representations of data, and are the point of view of an implementer, not a user.

So one centralises these operations inside a module, and makes everything that
depends on the concrete representation private, inaccessible from outside.
The only things that aren’t inaccessible are the interface functions.
Then, no matter how much the representation is changed, the interface won’t
be, and so neither will code that calls it.

Ross’s reasoning is similar:

The crucial feature about outside-in design as the underlying methodology for software engineering is that […] proper treatment of interface mechanisms will allow the higher level to remain unchanged regardless of changes in the details of the inner level. In other words all of the variations can be accommodated in the interface changes, so that the outer level in completely unaffected.

[…]

The criterion is this: A single uniform referent mechanism must be used for all references to operands. […] In order for the programs of an outer level to remain completely unchanged as interface programs change to accommodate various forms of inter-level detail, the manner of referencing these operands must be the same for all possible forms.

There is one difference, however. Ross was writing many years ago, before our modern views on abstract data types. I think it’s fair to say that he sees things in slightly more representational terms than we would today. If we were implementing a data structure as an array, we’d make our interface hide the fact that it’s an array, by exporting functions that do the subscripting. Ross, however, would write the subscripting operations themselves in the interface, but in a notation that looks the same as function calls. Which is why later on in his paper, Ross writes the following table:

The notation A(B) in any syntactic context always means

Property A of thing B

AED-0 declarations allow choices:

A: ARRAY B: INDEX
A: COMPONENT POINTER
A: FUNCTION ARGUMENT
A: MACRO ITEM-STRING

With the .INSERT STATEMENT:

Program file Declaration files
BEGIN 7094 version
.INSERT DECL $  360 version
BODY STATEMENTS $ 1108 version
END FINI

The program file never changes when any declaration is used

(Continued.)

To leave a comment for the author, please follow the link and comment on their blog: R – Jocelyn Ireson-Paine's Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)