R for Quants, Part III (A)

February 18, 2012
By

(This article was first published on Cartesian Faith » R, and kindly contributed to R-bloggers)

This is the third part in a three part series on teaching R to MFE students at CUNY Baruch. The focus of this lesson is on programming methods and application development in R.

Contents

PART I: PRELIMINARIES

PART II: STATISTICS

PART III: STRUCTURING CODE

Object-Oriented Programming

Depending on with whom you speak, you may hear that R is object-oriented. Others will say it’s functional. In fact it’s both and neither simultaneously. In R, object-oriented programming centers around how functions are dispatched and less about how code is structured. S3 introduces a class attribute and a polymorphic dispatching system, which resembles functional programming. In S4 certain embellishments give the illusion of a class-based programming model. The two systems are mostly compatible, but there are instances where there can be conflicts. Large projects like RMetrics and BioConductor heavily use the S4 style, but many smaller projects do not really benefit from the added complexity of S4.

S3 Classes and Dispatching

The simplest dispatching system is object-oriented in the sense that a function is called based on the ‘class’ of the first argument. A variable’s class is simply an attribute attached to the variable.

> class(h)
 [1] "xts" "zoo" "returns"
> attr(h, 'class')
 [1] "xts" "zoo" "returns"

When calling a function the actual implementation depends on whether the generic function is S3 or not. If it is, the definition will typically defer to a separate function called UseMethod. This function will dispatch to a concrete implementation based on the class of the first argument. The matching function will be named

dispatched function := base function "." class

If no such function is found, then the default function is called. As an example, let’s look at the function mean:

> mean
function (x, ...)
UseMethod("mean")
<bytecode: 0x1051616a8>
<environment: namespace:base>

This function has a number of implementations including a default function mean.default. Try (methods(mean) to see what’s available). Hence, to get the mean of the returns our portfolio, mean(h) will dispatch to mean.default since there are no declared functions for any of the classes associated with h.

> mean(h)
[1] 0.001991222

Unfortunately, this isn’t the behavior we want. Rather, we want to see the mean for each asset. We can accomplish this by implementing a new function mean.zoo (which would then apply to any zoo objects).

> mean.zoo <- function(x, ...) apply(x, 2, mean, ...)
> mean(h)
        AAPL          XOM           KO            F           GS
0.0023180089 0.0021953922 0.0002628634 0.0028525868 0.0023272563

This technique can be used to create new functions as well as add implementations to existing S3 methods.

S4 Classes and Dispatching

While S3 is simple yet powerful, it doesn’t offer much in the way of programmer safety. Since the class attribute can be changed at will, it’s easy to break the convention and consequently other people’s code. The S4 system attempts to formalize object-oriented programming. It introduces constructors, type safety, inheritance and other features typically associated with object-oriented programming languages.

Classes are defined using the setClass and setClassUnion functions.

setClassUnion('XtsNull', c('xts','NULL'))
setClass('Equity',
  representation(ticker='character', returns='XtsNull'),
  prototype=list(ticker='', returns=NULL))

Methods are then attached to the class using the setGeneric and setMethod functions.

setGeneric('beta', function(equity, market, ...) standardGeneric('beta'))
setMethod('beta', c('Equity','Equity'),
  function(equity, market) {
    cov(equity@returns, market@returns) / var(market@returns)
  })

Instances are created with the new function.

> xom <- new('Equity', ticker='XOM', returns=h$XOM)
> mkt <- new('Equity', ticker='^GSPC', returns=h[,'^GSPC'])
> beta(xom,mkt)
        ^GSPC
XOM 0.8693016

Clearly one cost of the S4 system is the added overhead in programming. It is not so easy to transition from exploratory programming to formal applications because S4 demands a lot of structure from the beginning.

There are also now ReferenceClasses, which is like S4 but objects are mutable, creating an even stronger object-oriented paradigm within R.

Functional Dispatching

While much emphasis has been on object-oriented programming in R, other programming paradigms are equally valid. Functional programming has become popular once again, and R is particularly suited for this programming style.

The Futile.Paradigm

R has its roots in both S and Scheme. Many of the improvements to S (e.g. lexical scoping) is directly attributed to Scheme, which is a functional language derived from LISP. The futile.paradigm borrows additional concepts from the functional world so programs can be structured functionally*. This package attempts to return R to an environment that is conducive to iterative development that leads to structured programs. In fact, this is one of John Chambers’ original goals for the S language [1]. This package introduces syntax to write multi-part functions reminiscent of Erlang or Haskell.

Functions in the futile.paradigm are defined as multipart definitions. The advantage of this approach is that data manipulation is kept separate from application logic. The drawback is that it’s more verbose. Multipart functions are defined as separate clauses each with corresponding guard statements. Guards define the conditions for executing a particular implementation. Here is the beta implementation again,

beta %when% (equity %isa% Equity)
beta %also% (market %isa% Equity)
beta %as% function(equity, market)
{
  cov(equity$returns, market$returns) / var(market$returns)
}

Each function clause is started with a %when% operator. Additional guard statements can be added using the %also% operator. The actual function definition is then specified by the %as% operator. Supporting additional signatures is as easy as adding another function clause.

beta %when% (portfolio %hasa% returns)
beta %also% (market %isa% zoo)
beta %as% function(equity, market)
{
  cov(portfolio$returns, market) / var(market)
}

For more robust code, post-assertions can also be added to the function definition using the %must% operator. A post-assertion specifies a condition that must be satisfied after the function is executed. If it fails, then program evaluation will be halted.

* Note that I’m the author of futile.paradigm, so the description is slightly biased.

A Type System

The futile.paradigm offers its own pseudo type system. Types are simply data structures tagged as a particular data type. We avoid using the word ‘class’ to avoid confusion with the legacy OOP programming models. Types can be created on the fly with minimal ceremony,

> equity <- create(Equity, ticker='XOM', price='85.62')
> equity$ticker
[1] "XOM"
> equity %isa% Equity
[1] TRUE
> equity %isa% Bond
[1] FALSE

If default properties are necessary, then a formal type constructor can be defined,

create.Bond <- function(T, coupon=0.02, tenor=10)
  list(coupon=coupon, tenor=tenor)

Creating an instance of the type is the same as before,

> bond <- create(Bond)
> bond$coupon
[1] 0.02

In the above example, the astute reader will wonder how Equity and Bond can be represented as raw types. S3 and S4 classes are not native to the language, so they must be wrapped in quotes and represented as strings. The futile.paradigm provides syntactic sugar to allow the use of raw types. To use this feature, the types must be specified as PascalCase, otherwise they, too, must be enclosed in quotes.

Ultimately the choice of programming style depends on the author of the software and the domain in use. In finance many concepts are directly related to mathematics, which itself is functional, so translating these ideas to code is much simpler than in object-oriented contexts [2].

References

[1] J. Chambers. Evolution of the s language. In Proceedings of the 20th Symposium on the Interface. The Interface Foundation of North America, 1996.
[2] B. Rowe. A Beautiful Paradigm: Functional Programming in Finance. R/Finance 2011, 2011.


To leave a comment for the author, please follow the link and comment on his blog: Cartesian Faith » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , ,

Comments are closed.