# Simple yet elegant Object-Oriented programming in R with S3

**R on Coding Club UC3M**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

by Iñaki Úcar

The R language is peculiar in many ways, and its approach to object-oriented (OO) programming is just one of them. Indeed, base R supports not one, but three different OO systems: S3, S4 and RC classes. And yet, probably none of them would qualify as a fully-fledged OO system before the astonished gaze of an expert in languages such as Python, C++ or Java. In this tutorial, we will review the S3 system, the simplest yet most elegant of them. The use case of the quantities framework (CRAN packages `units`

, `errors`

and `quantities`

) will serve as the basis to study the main concepts behind S3 programming in R: classes, generics, methods and inheritance.

## Introduction

You probably heard of John Chambers’ slogans about R, i.e., *everything that exists is an object* and *everything that happens is a function call* (and a function is an object too). In most languages, *object* is an OO concept referring to an instance of a certain class that encapsulates its whole lifetime: construction, associated data, behavior and destruction. In R, instead, roughly speaking, an object is just a base type with possibly some attributes. In other words, everything is about the data in R, while construction, behavior and destruction are defined elsewhere.

Let us start with the common example: what are *factors* in R? What makes them so special?

x <- factor(c("apple", "orange", "orange", "pear")) typeof(x)

## [1] "integer"

attributes(x)

## $levels ## [1] "apple" "orange" "pear" ## ## $class ## [1] "factor"

unclass(x)

## [1] 1 2 2 3 ## attr(,"levels") ## [1] "apple" "orange" "pear"

As you can see, factors are just integer vectors with some metadata stored as an attribute named `levels`

. Note also that there is a special attribute called `class`

which is central in defining how these objects behave. Where and how to do it is what S3 programming is about. This decoupling makes S3 less formal and more error-prone, but on the other hand, it is much more flexible, because it allows you e.g. to extend classes in other packages without requiring them to modify a single bit.

Arguably, there are two main use cases for an S3-style design:

- Certain procedure generates a list of heterogeneous things that needs to be subsequently manipulated as a whole. For example, the output from a linear regression:

fit <- lm(Petal.Width ~ Petal.Length, data=iris) typeof(fit)

## [1] "list"

class(fit)

## [1] "lm"

names(fit)

## [1] "coefficients" "residuals" "effects" "rank" ## [5] "fitted.values" "assign" "qr" "df.residual" ## [9] "xlevels" "call" "terms" "model"

- Certain base type requires to be enriched with some metadata that needs to be manipulated alongside it to provide some additional functionality (e.g., factors).

In this practical introduction to S3 programming, we will cover the second use case by reviewing the basic design of the quantities framework, which enables quantity calculus for R vectors, matrices and arrays, with automatic propagation, conversion, derivation and simplification of magnitudes and uncertainties.

## Defining a new object

As a motivating example, we would like R to be able to automatically handle measurements with uncertainty (hereinafter, for convenience, just *errors*). Base numeric vectors are just fine to handle measurements; we just need to add the errors as metadata (in an attribute) and choose a proper class. In order to do this, no special care is needed beyond 1) avoiding special attribute names for our metadata, with special meaning (such as `class`

, `names`

, `dim`

and `dimnames`

), and 2) avoiding class names that are already in use out there.

In our case, we will store the errors in an attribute called `errors`

, and the class name will be `errors`

(not fanciful, just convenient). The first piece we need is a *constructor*:

set_errors <- function(x, value=0) { # check that 'value' has a proper length stopifnot(any(length(value) == c(length(x), 1L))) if (length(value) == 1) value <- rep(value, length(x)) structure(x, errors=abs(value), class="errors") # equivalent to: # attr(x, "errors") <- abs(value) # class(x) <- "errors" # x }

The `attr`

and `attributes`

functions allows us to retrieve attributes, but the user should not need to know the particular implementation details, so let us define an explicit *getter*:

errors <- function(x) attr(x, "errors")

Now, we can define a set of measurements and asign an error of 0.1 to each one of them as follows:

x <- set_errors(1:10, seq(0.1, 1, 0.1)) x

## [1] 1 2 3 4 5 6 7 8 9 10 ## attr(,"errors") ## [1] 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 ## attr(,"class") ## [1] "errors"

errors(x)

## [1] 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

But still, nothing interesting happens with them.

x + x

## [1] 2 4 6 8 10 12 14 16 18 20 ## attr(,"errors") ## [1] 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 ## attr(,"class") ## [1] "errors"

## Modifying the behavior: generics, methods and groups

One of the first things we would probably want to do with our new object is to print it properly, because our users are not interested in the implementation details that we are able to see above (i.e., the actual attributes). Therefore, we need to provide a `print`

method.

There are great sources to teach you all the technical details about S3 (there are some links at the end of this article). We will limit ourselves here to briefly introduce the basic building blocks of S3 programming: generics and methods. Many (most?) of the functions you daily use are *generics* (or *internal generics*, which are internally implemented functions that *mostly* behave like generics; see `help(InternalMethods)`

).

isS3stdGeneric(print)

## print ## TRUE

isS3stdGeneric(mean)

## mean ## TRUE

.S3PrimitiveGenerics

## [1] "anyNA" "as.character" "as.complex" "as.double" ## [5] "as.environment" "as.integer" "as.logical" "as.numeric" ## [9] "as.raw" "c" "dim" "dim<-" ## [13] "dimnames" "dimnames<-" "is.array" "is.finite" ## [17] "is.infinite" "is.matrix" "is.na" "is.nan" ## [21] "is.numeric" "length" "length<-" "levels<-" ## [25] "names" "names<-" "rep" "seq.int" ## [29] "xtfrm"

This basically means that you can write *methods* for those generics to implement how they behave when they are applied to a particular object class. And this is as easy as writing a function named

. The signature of this function must accommodate, at least, the same arguments as the generic. In other words, a method can be *more specific* than the generic, but cannot lose generality. Some examples:

- For generic
`foo(x)`

, method must be`foo.bar(x)`

. - For generic
`foo(x, y, z)`

, method can be`foo.bar(x, y, z)`

, or, if you are just interested in the first argument, for example, you may use dots`foo.bar(x, ...)`

to summarize the rest of them. - For generic
`foo(x, ...)`

, method can be`foo.bar(x, ...)`

, or you may be more specific and take some arguments out of the dots`foo.bar(x, y, z, ...)`

(see`help(print)`

for examples).

In the following example, `print.errors`

is defined. This particular implementation just shows a maximum of 5 error terms and then delegates the printing of the main vector of values to the default method by calling `NextMethod`

.

print.errors <- function(x, ...) { # print 5 errors at most err <- errors(x) e <- paste(format(err[1:min(5, length(err))]), collapse=" ") if (length(err) > 5) e <- paste(e, "...") cat("Errors: ", e, "\n", sep = "") # drop attribute and class, then call the next method class(x) <- setdiff(class(x), "errors") attr(x, "errors") <- NULL NextMethod() }

Dropping the class and the attribute is not needed in general, but in this case we do not want the default printing method (`print.default`

) to show them, so we remove them before calling `NextMethod`

. The latter is not required either, but it is highly advisable if we want to support inheritance, as we will see later.

Now, if we try to print our object again,

x

## Errors: 0.1 0.2 0.3 0.4 0.5 ... ## [1] 1 2 3 4 5 6 7 8 9 10

The output looks nicer. Basically, when `print`

is called, it takes the class, `errors`

, and looks for a method called `print.errors`

. Before we defined it, nothing was found and thus the default method was called. If you call `methods(print)`

you will see that there are hundreds of `print`

methods defined for hundreds of different classes, and our new method is of course listed there.

The next step would be to implement common functions to work with vectors in R, such as `[`

and `c`

.

`[.errors` <- function(x, ...) set_errors(NextMethod(), errors(x)[...]) c.errors <- function(..., recursive = FALSE) set_errors(NextMethod(), c(unlist(sapply(list(...), errors))))

Now, we can subset and concatenate `errors`

objects, and their errors are also handled appropriately.

c(x[3], x[7:9])

## Errors: 0.3 0.7 0.8 0.9 ## [1] 3 7 8 9

Next, we would like to be able to operate normally with these vectors so that errors are automatically recomputed and propagated. At this point, you are probably guessing that we basically need to start reimplementing a bunch of mathematical functions and arithmetic operations to work with our new class. And you are actually right, but there is a shortcut for that.

Fortunately, R defines what is called *S3 group generics* (see `help(groupGeneric)`

). These are four pre-specified groups of functions containing the following generics:

- Group
`Math`

includes mathematical functions:`abs`

,`sign`

,`sqrt`

- Rounding functions (
`floor`

,`ceiling`

…) - Exponential, logarithms and trigonometric functions (
`cos`

,`sin`

…) - Special functions such as
`gamma`

- Cumulative sum, product, max and min

- Group
`Ops`

includes arithmetic operations:- Addition, subtraction, multiplication, power, division and modulo
- Logic and boolean operators

- Group
`Summary`

includes:`all`

,`any`

,`sum`

,`prod`

,`min`

,`max`

,`range`

- Group
`Complex`

includes functions to work with complex numbers:- Argument, conjugate, real/imaginary parts, modulus

The existence of these groups means that we can simply write a single method for a whole category to provide for all the functions included. And this is what the `errors`

package does by implementing `Math.errors`

, `Ops.errors`

and `Summary.errors`

. We can take a glimpse into one of these methods:

errors:::Ops.errors

## function (e1, e2) ## { ## if (.Generic %in% c("&", "|", "!", "==", "!=", "<", ">", ## "<=", ">=")) { ## warn_once("boolean operators not defined for 'errors' objects, uncertainty dropped", ## fun = .Generic, type = "bool") ## return(NextMethod()) ## } ## if (!missing(e2)) { ## coercion <- cond2int(!inherits(e1, "errors"), !inherits(e2, ## "errors")) ## if (coercion) { ## warn_once("non-'errors' operand automatically coerced to an 'errors' object with no uncertainty", ## fun = "Ops", type = "coercion") ## switch(coercion, e1 <- set_errors(e1), e2 <- set_errors(e2)) ## } ## } ## deriv <- switch(.Generic, `+` = , `-` = if (missing(e2)) { ## e2 <- NA ## list(do.call(.Generic, list(1)), NA) ## } else list(1, do.call(.Generic, list(1))), `*` = list(.v(e2), ## .v(e1)), `/` = , `%%` = , `%/%` = list(1/.v(e2), -.v(e1)/.v(e2)^2), ## `^` = list(.v(e1)^(.v(e2) - 1) * .v(e2), .v(e1)^.v(e2) * ## log(abs(.v(e1))))) ## propagate(unclass(NextMethod()), e1, e2, deriv[[1]], deriv[[2]]) ## } ## <bytecode: 0x56420b6a8a08> ## <environment: namespace:errors>

All the arithmetic operators, as well as logical and boolean operators, share a lot of properties. But still, it is clear that we need some mechanism to be able to do different things depending on which one was called. To solve this, the S3 dispatching mechanism sets for us special variables:

`.Generic`

contains the name of the generic function.`.Method`

contains the name of the method.`.Class`

contains the class(es) of the object.

This way, checking `.Generic`

allows us to issue a warning if a non-supported method is called, and then delegate to `NextMethod`

.

Note also that the `Ops`

group is pretty special for two reasons: first, there are unary operators (e.g., `-1`

), so sometimes the second argument is missing, and we need to take that into account; secondly, these operators are commutative, and therefore S3 supports *double dispatch* in this case.

Finally, `Ops.errors`

addresses all the arithmetic operators in a unified way: it computes two derivatives depending on `.Generic`

and then propagates the uncertainty using an auxiliary function that implements the Taylor Series Method.

What we have seen here is just a taste of the actual implementation of the `errors`

package. Now, let’s clear the workspace and load it to check the complete functionality.

rm(list=ls()) library(errors) set.seed(42) x <- 1:5 + rnorm(5, sd = 0.01) y <- 1:5 + rnorm(5, sd = 0.02) errors(x) <- 0.01 errors(y) <- 0.02 x; y

## Errors: 0.01 0.01 0.01 0.01 0.01 ## [1] 1.013710 1.994353 3.003631 4.006329 5.004043

## Errors: 0.02 0.02 0.02 0.02 0.02 ## [1] 0.9978775 2.0302304 2.9981068 4.0403685 4.9987457

(z <- x / y)

## Errors: 0.022693105 0.010858436 0.007469263 0.005497048 0.004477051 ## [1] 1.0158657 0.9823284 1.0018427 0.9915751 1.0010597

correl(x, x) # one, cannot be changed

## [1] 1 1 1 1 1

correl(x, y) # NULL, not defined yet

## NULL

correl(x, y) <- runif(length(x), -1, 1) correl(x, y)

## [1] 0.8080628 -0.7225797 0.9777835 0.8933365 -0.8351249

covar(x, y)

## [1] 0.0001616126 -0.0001445159 0.0001955567 0.0001786673 -0.0001670250

(z_correl <- x / y)

## Errors: 0.013609754 0.013667062 0.003492530 0.002917634 0.005781596 ## [1] 1.0158657 0.9823284 1.0018427 0.9915751 1.0010597

(df <- data.frame(x, 3*x, x^2, sin(x), cumsum(x)))

## Warning: In 'Ops' : non-'errors' operand automatically coerced to an ## 'errors' object with no uncertainty

## x X3...x x.2 sin.x. cumsum.x. ## 1 1.01(1) 3.04(3) 1.03(2) 0.849(5) 1.01(1) ## 2 1.99(1) 5.98(3) 3.98(4) 0.912(4) 3.01(1) ## 3 3.00(1) 9.01(3) 9.02(6) 0.14(1) 6.01(2) ## 4 4.01(1) 12.02(3) 16.05(8) -0.761(6) 10.02(2) ## 5 5.00(1) 15.01(3) 25.0(1) -0.958(3) 15.02(2)

format(df, notation="plus-minus")

## x X3...x x.2 sin.x. cumsum.x. ## 1 1.01 ± 0.01 3.04 ± 0.03 1.03 ± 0.02 0.849 ± 0.005 1.01 ± 0.01 ## 2 1.99 ± 0.01 5.98 ± 0.03 3.98 ± 0.04 0.912 ± 0.004 3.01 ± 0.01 ## 3 3.00 ± 0.01 9.01 ± 0.03 9.02 ± 0.06 0.14 ± 0.01 6.01 ± 0.02 ## 4 4.01 ± 0.01 12.02 ± 0.03 16.05 ± 0.08 -0.761 ± 0.006 10.02 ± 0.02 ## 5 5.00 ± 0.01 15.01 ± 0.03 25.0 ± 0.1 -0.958 ± 0.003 15.02 ± 0.02

And much more. This is the complete list of methods implemented in the package:

methods(class="errors")

## [1] [ [[ [[<- [<- all.equal ## [6] as.data.frame as.list as.matrix c cbind ## [11] correl correl<- covar covar<- diff ## [16] drop_errors errors_max errors_min errors errors<- ## [21] format log10 log2 Math mean ## [26] median Ops plot print quantile ## [31] rbind rep set_correl set_covar set_errors ## [36] summary Summary t weighted.mean ## see '?methods' for accessing help and source code

## Implementing new functionality: new generics

So far, we have explored S3 programming mostly by extending existing generics with methods for a new class of our own. But what if we want to add new functionality? You have probably already guessed that you need to implement your own generics. Let’s take a step back and use a simpler classic example to demonstrate this.

# constructors circle <- function(r) structure(list(r=r), class="circle") rectangle <- function(a, b) structure(list(a=a, b=b), class="rectangle") # generics perimeter <- function(shape) UseMethod("perimeter") area <- function(shape) UseMethod("area") # methods print.circle <- function(x, ...) with(x, cat("r =", r, "\n")) perimeter.circle <- function(shape) with(shape, 2 * pi * r) area.circle <- function(shape) with(shape, pi * r^2) print.rectangle <- function(x, ...) with(x, cat("a =", a, ", b =", b, "\n")) perimeter.rectangle <- function(shape) with(shape, 2 * (a + b)) area.rectangle <- function(shape) with(shape, a * b) # usage example (x <- circle(5))

## r = 5

(y <- rectangle(10, 5))

## a = 10 , b = 5

perimeter(x)

## [1] 31.41593

perimeter(y)

## [1] 30

area(x)

## [1] 78.53982

area(y)

## [1] 50

As you can see, implementing a new generic is as easy as defining a function that calls `UseMethod`

with the name of the generic. More advanced uses may, for example, manipulate the input variables before calling `UseMethod`

, but this basic template would fit most use cases.

## Combining functionality: inheritance

Inheritance is also possible with S3 programming. Instead of assigning a single class, there may be multiple classes defined as a character vector. When this happens, R dispatches the first class, and subsequent calls to `NextMethod`

look for other methods in the class vector. This means that, if we want our object to resemble parent-child relationships, parent classes must go first in the class vector.

# constructor shape <- function(name, ..., color) { shape <- do.call(name, list(...)) shape$color <- color structure(shape, class=c("shape", class(shape))) } # methods print.shape <- function(x, ...) { cat(x$color, .Class[2], "\n") cat("parameters: ") NextMethod() # call that particular shape's print method } # usage example (x <- shape("circle", 5, color="red"))

## red circle ## parameters: r = 5

(y <- shape("rectangle", 10, 5, color="blue"))

## blue rectangle ## parameters: a = 10 , b = 5

class(x)

## [1] "shape" "circle"

class(y)

## [1] "shape" "rectangle"

perimeter(x)

## [1] 31.41593

perimeter(y)

## [1] 30

area(x)

## [1] 78.53982

area(y)

## [1] 50

This is exactly what the package `quantities`

does to combine the functionality of packages `errors`

and `units`

. As we have seen, `errors`

defines uncertainty metadata for R vectors, and `units`

does the same for measurement units. To achieve a complete calculus system, `quantities`

prepends a superclass to be able to orchestrate `units`

and `errors`

while keeping them completely independent.

library(quantities)

## Loading required package: units

## udunits system database from /usr/share/udunits

# start with a units object and add errors (x <- set_units(1:5, "m"))

## Units: [m] ## [1] 1 2 3 4 5

class(x)

## [1] "units"

(x <- set_errors(x, 0.1))

## Units: [m] ## Errors: 0.1 0.1 0.1 0.1 0.1 ## [1] 1 2 3 4 5

class(x)

## [1] "quantities" "units" "errors"

# start with an errors object and add units (x <- set_errors(1:5, 0.1))

## Errors: 0.1 0.1 0.1 0.1 0.1 ## [1] 1 2 3 4 5

class(x)

## [1] "errors"

(x <- set_units(x, "m"))

## Units: [m] ## Errors: 0.1 0.1 0.1 0.1 0.1 ## [1] 1 2 3 4 5

class(x)

## [1] "quantities" "units" "errors"

# both at the same time (y <- set_quantities(1:5, "s", 0.1))

## Units: [s] ## Errors: 0.1 0.1 0.1 0.1 0.1 ## [1] 1 2 3 4 5

class(x)

## [1] "quantities" "units" "errors"

# and everything just works (z <- x / (y*y))

## Units: [m/s^2] ## Errors: 0.223606798 0.055901699 0.024845200 0.013975425 0.008944272 ## [1] 1.0000000 0.5000000 0.3333333 0.2500000 0.2000000

correl(x, z)

## [1] 0.4472136 0.4472136 0.4472136 0.4472136 0.4472136

sum(z)

## 2.3(2) [m/s^2]

## Summary

S3 programming requires three steps:

- Define a constructor (possibly a validator too) that sets the class and the attributes.
- Define new generics with
`UseMethod`

. - Implement methods for new and existing generics with a proper use of
`NextMethod`

to support inheritance and also avoid code duplication.

In contrast to other OO systems, objects hold the data, and methods live outside as independent functions. The developer needs to invest more effort and care to ensure correctness and support inheritance, but in exchange, it is a very flexible system, because it is easy to extend without modifying the object.

## Further reading

More complete technical readings:

- The R Language Definition and manual pages Methods for S3 and Class Methods.
- Hadley’s OO field guide and book chapter about S3.

More about the quantities framework:

- R Journal papers Measurement Units in R and Measurement Errors in R.
- Blog posts about the R Quantities project, funded by the R Consortium (I, II, III, IV).

**leave a comment**for the author, please follow the link and comment on their blog:

**R on Coding Club UC3M**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.