Annotated source code

[This article was first published on Digithead's Lab Notebook, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

We programmers are told that reading code is a good idea. It may be good for you, but it’s hard work. Jeremy Ashkenas has come up with a simple tool that makes it easier: docco. Ashkenas is also behind underscore.js and coffeescript, a dialect of javascript in which docco is written.

Interesting ways to mix prose and code have appealed to me ever since I first discovered Mathematica’s live notebook, which lets you author documents that combine executable source code, typeset text and interactive graphics. For those who remember the early 90’s chiefly for their potty training, running Mathematica on the Next pizza boxes was like a trip to the future. Combining the quick cycles of a Read-evaluate-print-loop with complete word processing and mathematical typesetting encourages you to keep lovely notes on your thinking and trials and errors.

Along the same lines, there’s Sweave for R and sage for Python.

Likewise, one of the great innovations of Java was Javadoc. Javadoc doesn’t get nearly enough credit for the success of Java as a language. It made powerful API’s like the collections classes a snap and even helped navigate the byzantine complexities of Swing and AWT.

These days, automated documentation is expected for any language. Nice examples are: RubyDoc, scaladoc, Haddock (for Haskell). Doxygen works with a number of languages. Python has pydoc, but in practice seems to rely more on the library reference. Anyway, there are a bunch, and if your favorite language doesn’t have one, start coding now.

The grand-daddy of these ideas is Donald Knuth’s literate programming.

I believe that the time is ripe for significantly better documentation of programs, and that we can best achieve this by considering programs to be works of literature. Hence, my title: “Literate Programming.”

Let us change our traditional attitude to the construction of programs: Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do.

The practitioner of literate programming can be regarded as an essayist, whose main concern is with exposition and excellence of style. Such an author, with thesaurus in hand, chooses the names of variables carefully and explains what each variable means. He or she strives for a program that is comprehensible because its concepts have been introduced in an order that is best for human understanding, using a mixture of formal and informal methods that reinforce each other.

Indeed, Ashkenas references Knuth, calling docco “quick-and-dirty, hundred-line-long, literate-programming”.

This goodness needs to come to more language. There’s a ruby port called rocco by Ryan Tomayko. And for Clojure there’s marginalia.

I love the quick-and-dirty aspect and that will be the key to encouraging programmers to do more documentation that looks like this. I hope they build docco, or something like it, into github. Maybe one day there will be a Norton’s anthology of annotated source code.

Vaguely related

To leave a comment for the author, please follow the link and comment on their blog: Digithead's Lab Notebook. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)