Object Oriented Programming in R (Part 3): A Practical Guide to the S4 System
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In the previous article, we learned about the first OOP system in R called S3. In this one, we are going to dive into the S4 OOP system.
The S4 system is a more formal OOP system developed by Bell Labs and introduced into the S language in the late 1990s.
Today, we will learn about features of S4 and look at example use cases of this system in the community. We will also learn about some recommended practices to consider when using S4 classes and cover general tips on object-oriented programming in R.
Read the series from the beginning:
- Object-Oriented Programming in R (Part 1): An Introduction
- Object Oriented Programming in R (Part 2): S3 Simplified
Table of Contents
- Object-Oriented Programming in R – Our First S4 Class and Method
- S4 Features for Object-Oriented Programming in R
- Recommended Practices in R Object-Oriented Programming
- S4 Usage in the Community
- Summing up Object Oriented Programming in R – Part 3
Object-Oriented Programming in R – Our First S4 Class and Method
We will reuse our examples from our OOP in R with R6 – The complete guide article. Let’s start by defining a function which will create objects of the dog class:
Defining an S4 Class
Let’s try to recreate the dog
class from the previous article but this time using S4. To create a new S4 class we use the setClass
function:
setClass( Class = "Dog", slots = c( name = "character", age = "numeric" ) )
The Class
argument defines the name of the class to which we will be referring later on and the slots
parameter defines the fields of our class.
Pay attention that the values of the named vectors correspond to the type of that value. The types of those slots will be validated when creating a new object or when changing the value of a given field.
Let’s see this in action! We will try to create a couple of dog objects using the new
function!
> d1 <- new("Dog", name = "Milo", age = 4) > d2 <- new("Dog", name = "Milo", age = "four years old") Error in validObject(.Object) : invalid class “Dog” object: invalid object for slot "age" in class "Dog": got class "character", should be or extend class "numeric"
As you can see, when we tried to use a character value for the age
slot, we got an error.
This is our first example of how S4 is more rigorous compared to S3. In the case of the S3 system we had to manually validate the arguments in our constructors.
Using S4 Slots
We can interact with our object using @
to retrieve fields of the object:
> d1@name [1] "Milo" > d1@age [1] 4
And if we try to change the value of the fields, validation will be performed as well:
> d1@age <- "four years old" Error in (function (cl, name, valueClass) : assignment of an object of class “character” is not valid for @‘age’ in an object of class “Dog”; is(value, "numeric") is not TRUE > d1@age <- 5
Another difference between @
and $
in S3 is that S4 slots are not partially matched.
new_dog <- function(name, age) { structure( list( name = name, age = age ), class = "dog" ) } s3_dog <- new_dog(name = "Milo", age = 4) s4_dog <- new("Dog", name = "Milo", age = 4) > s3_dog$a # will return value of the "age" field [1] 4 > s4_dog@a Error: no slot of name "a" for this object of class "Dog"
This can save us from introducing an unexpected bug caused by partial matching!
Defining an S4 Method
Let’s see how our S4 dog would get printed out:
> print(d1) An object of class "Dog" Slot "name": [1] "Milo" Slot "age": [1] 4
Here we are using the default way of printing S4 classes. In case we want to have a different way of printing, we need to create a custom show
method.
To define an S4 method we use the setMethod()
function:
setMethod( f = "show", signature ="Dog", definition = function(object) { cat("Dog: \n") cat("\tName: ", object@name, "\n", sep = "") cat("\tAge: ", object@age, "\n", sep = "") } )
Let’s break down the arguments one by one:
f
– is the name of the generic function we want to implement.signature
– defines the classes required for the method argumentsdefinition
– is our actual implementation of the method
Let’s give it a try!
> print(d1) Dog: Name: Milo Age: 5
Defining Our Own Generic
For now, we implemented a method for an existing show
generic. What if we want to create our own dog-related functionalities? Let’s create a makeSound
generic:
setGeneric( name = "makeSound", def = function(x) standardGeneric("makeSound") )
Now, we need to implement a makeSound
method for our dog class:
setMethod( f = "makeSound", signature = "Dog", definition = function(x) { cat(x@name, "says", "Wooof!\n") } )
Let’s give it a go:
> makeSound(d1) Milo says Wooof!
S4 Features for Object-Oriented Programming in R
We created our first S4 class and generic. Now let’s explore some additional features that the S4 system has to offer!
Object Validation
Apart from the validation of slots, we can also define additional constraints using validators. For example, as of now, we are able to create a dog with a negative age:
dog_with_negative_age <- new("Dog", name = "Milo", age = -1)
To prevent that from happening, let’s define a validator using the setValidity
method:
setValidity( Class = "Dog", method = function(object) { if (object@age < 0) { "age should be a positive number" } else { TRUE } } )
Let’s break down the arguments one by one:
Class
corresponds to the name of our class.Method
is a function that accepts one argument (the object to validate). The function needs to returnTRUE
if the object is valid or one or more descriptive strings if any problems are found.
And just to check if it’s working:
> dog_with_negative_age_take_2 <- new("Dog", name = "Milo", age = -1) Error in validObject(.Object) : invalid class “Dog” object: age should be a positive number
Important: Our custom validator will be called automatically only when creating an object, so it doesn’t prevent us from making the object invalid when changing the value of a slot.
d3 <- new("Dog", name = "Milo", age = 4) d3@age <- -4 Dog: Name: Milo Age: -4
But if we call explicitly our validator (by calling the validObject
function), we will learn that the object is not correct anymore:
validObject(d3) Error in validObject(d3) : invalid class “Dog” object: age should be a positive number
This is why some sources recommend defining accessor functions for classes to avoid such issues. We will go back to this topic when talking about recommended practices for using S4.
Virtual Classes
The S4 system provides support for virtual classes – or classes that cannot be instantiated. Ok, but why would we want to do that?
Virtual classes can be useful in cases where you want to define implementation details that can be reused by other classes through inheritance.
We already covered inheritance in our previous blog post when using S3 methods, so let’s see how we could leverage virtual classes if we wanted to define Cat
classes that have some shared functionality with Dog
classes.
Both cats and dogs have a name and age, right? Let’s define a virtual class Animal that will contain both name
and age
fields.
setClass( Class = "Animal", contains = "VIRTUAL", slots = c( name = "character", age = "numeric" ) )
This time, because we are defining a virtual class, we set the contains
parameter to “VIRTUAL”.
If we try to create an animal object, we will get an error:
new("Animal", name = "Milo", age = 4) Error in new("Animal", name = "Milo", age = 4) : trying to generate an object from a virtual class ("Animal")
Now, let’s use our Animal
class to create a Cat
class and a new version of the Dog
class:
setClass( Class = "Dog", contains = "Animal" ) setClass( Class = "Cat", contains = "Animal" )
Now, we can create Dog
and Cat
objects with name
and age
fields:
d <- new("Dog", name = "Milo", age = 4) c <- new("Cat", name = "Tucker", age = 2)
This saved us some typing as we only defined the name
and age
slots in the Animal
virtual class. Best part? Validators and methods are inherited as well:
setValidity( Class = "Animal", method = function(object) { if (object@age < 0) { "An animal cannot have a negative age" } else { TRUE } } ) d <- new("Dog", name = "Milo", age = -4) Error in validObject(.Object) : invalid class “Dog” object: An animal cannot have a negative age c <- new("Cat", name = "Tucker", age = -2) Error in validObject(.Object) : invalid class “Cat” object: An animal cannot have a negative age
The show
method for our Cat
will be very similar to the one we defined before for the Dog
class, but we want to display the word “Cat” instead of “Dog”. We could implement it like this:
setMethod( f = "show", signature ="Animal", definition = function(object) { object_class <- is(object)[1] cat(object_class, " (an Animal) \n") cat("\tName: ", object@name, "\n", sep = "") cat("\tAge: ", object@age, "\n", sep = "") } )
Now, let’s see what happens if we try to print our cat:
print(c) Cat (an Animal) Name: Tucker Age: 2
It’s working! We already learned in the previous articles how inheritance helps with reusing code between classes, allowing us to not repeat ourselves.
Additionally, by using virtual classes we can reuse code that does not necessarily make sense when used in isolation, so we can prevent users from accidentally creating objects that might not make sense.
Multiple Dispatch
First of all, what is method dispatch? We already made use of it in the article about S3 classes! Remember when we were defining the make_sound.cat
and make_sound.dog
methods?
The make_sound
generic would use the class of its first argument to identify which method should be called. If it’s a dog
then it uses the make_sound.dog
method and if it’s a cat it uses the make_sound.cat
method.
Ok, now we know what method dispatch is, but what is multiple dispatch? It is the same concept but applied to multiple arguments! That means you can use classes of multiple arguments to pick the right method!
Let’s see an example: we will create a Pizza
and Pineapple
class and try to combine them.
setClass( Class = "Pizza", slots = c( diameter = "numeric" ) ) setClass( Class = "Pineapple", slots = c( weight = "numeric" ) ) setGeneric( name = "combine", def = function(x, y) standardGeneric("combine"), signature = c("x", "y") )
In the setGeneric
function we can use the signature
argument to define which arguments should be used for dispatching (Note: by default, all formal arguments except … are used, but we wanted to be explicit in this example).
Now let’s implement methods for particular cases:
- Combining Pizza with Pizza.
- Combining Pineapples with Pineapples.
- Combining Pizzas with Pineapple.
- Combining Pineapples with Pizzas (the order matters, so we need to cover this case as well!).
setMethod( f = "combine", signature = c("Pizza", "Pizza"), definition = function(x, y) { "Even more pizza!" } ) setMethod( f = "combine", signature = c("Pineapple", "Pineapple"), definition = function(x, y) { "Even more pineapple!" } ) setMethod( f = "combine", signature = c("Pineapple", "Pizza"), definition = function(x, y) { stop("Pineapple and pizza don't go well together!") } ) setMethod( f = "combine", signature = c("Pizza", "Pineapple"), definition = function(x, y) { stop("Pineapple and pizza don't go well together!") } ) pineapple <- new("Pineapple", weight = 1) pizza <- new("Pizza", diameter = 32) > combine(pizza, pizza) [1] "Even more pizza!" > combine(pineapple, pineapple) [1] "Even more pineapple!" > combine(pineapple, pizza) Error in combine(pineapple, pizza) : Pineapple and pizza don't go well together! > combine(pizza, pineapple) Error in combine(pizza, pineapple) : Pineapple and pizza don't go well together!
Multiple Inheritance
S4 supports multiple inheritance, which means we can inherit from more than one class. Let’s go back to our example hierarchy of Animal
, Dog
and Cat
classes.
What if we wanted to include the owner information for both cats and dogs? We could add an owner
slot to the Animal
class, but what if we wanted to add a Moose
class? Mooses usually are not pets!
In that case, we might want to use multiple inheritance and define a new virtual class called Pet
:
setClass( Class = "Pet", contains = "VIRTUAL", slots = c( owner = "character" ) ) # Animal class from the previous example setClass( Class = "Animal", contains = "VIRTUAL", slots = c( name = "character", age = "numeric" ) )
Now, we can define our Dog
and Cat
classes like this:
setClass( Class = "Dog", contains = c("Animal", "Pet") ) setClass( Class = "Cat", contains = c("Animal", "Pet") ) d <- new("Dog", name = "Milo", age = 5, owner = "Jane") c <- new("Cat", name = "Tucker", age = 2, owner = "John")
And in case we need to add a Moose
class in the future, we can do it like this:
setClass( Class = "Moose", contains = c("Animal") ) m <- new("Moose", name = "Moose", age = 21)
Class Unions
In the Defining an S4 class section, we specified what are the types of slots for a given class. But here’s the thing, what if we wanted to be more flexible and want one field to be either an instance of one class or another class?
For example, let’s say we want a class that holds information about our data source that could be either a data.frame
or path to a file containing the data.
We can use the unrestricted class ANY
:
setClass( Class = "DataManager", slots = c( "source" = "ANY" ) )
However, this is not safe as we can create an incorrect object:
new("DataManager", source = 1234)
Instead, we can use a class union! Let’s define a class union that would allow us to provide either characters or data.frames for the source
slot. This can be done using the setClassUnion
function:
setClassUnion( name = "DataSource", members = c("data.frame", "character") ) setClass( Class = "DataManager", slots = c( "source" = "DataSource" ) )
Now, our slot gets validated:
> new("DataManager", source = 1234) Error in validObject(.Object) : invalid class “DataManager” object: invalid object for slot "source" in class "DataManager": got class "numeric", should be or extend class "DataSource"
> Fun Fact: An interesting example of using class unions is the index class in the Matrix package. It is implemented as a union of numeric, logical, and character (source). This allows us to index matrices using numerics, logicals, or characters.
Coercion System
S4 offers a coercion system. By ‘coercion’ we mean the process of transforming a value of a given type to a value of another type. An example of coercion you might be familiar with is the as.numeric
function.
For example, coercing a character value into a numeric will look like this:
> as.numeric("123") [1] 123
But what if we want to coerce an object of one S4 class into an object of another S4 class? This is where we can leverage the S4 coercion system. Let’s assume we have a custom class for storing game scores:
setClass( Class = "GameScore", slots = c( "homeTeam" = "character", "awayTeam" = "character", "homeTeamScore" = "numeric", "awayTeamScore" = "numeric" ) ) game_score <- new( "GameScore", homeTeam = "Team A", awayTeam = "Team B", homeTeamScore = 114, awayTeamScore = 120 )
Now, what if we wanted to be able to convert an object of the GameScore
class into a data.frame
? We can define a method for coercing an object of GameScore
into a data.frame by using the setAs
method!
setAs( from = "GameScore", to = "data.frame", def = function(from) { data.frame( team = c(from@homeTeam, from@awayTeam), points = c(from@homeTeamScore, from@awayTeamScore) ) } )
Now, we can convert our game_score
into a data.frame
by using the as method:
> as(game_score, "data.frame") team points 1 Team A 114 2 Team B 120
The S4 system provides default coercion methods when coercing child classes to parent classes and the other way around. However, we have to be careful here as we might accidentally end up with an invalid object. (source)
Like here:
setClass( Class = "Password", slots = c( value = "character" ) ) setClass( Class = "LongPassword", contains = "Password", slots = c( value = "character" ) ) setValidity( Class = "LongPassword", method = function(object) { # For the sake of the example we assume 20 characters is long enough if (nchar(object@value) > 20) { TRUE } else { "Password is too short!" } } )
Now, let’s create some password objects:
short_password <- new( "Password", value = stringi::stri_rand_strings(n = 1, length = 3) ) long_password <- new( "LongPassword", value = stringi::stri_rand_strings(n = 1, length = 32) )
Everything is ok when we coerce our LongPassword
object into a Password
object. However, when converting from Password
to LongPassword
we end up with an invalid object:
coerced_short_password <- as(short_password, "LongPassword") validObject(coerced_short_password) # Error in validObject(coerced_short_password) : # invalid class “LongPassword” object: Password is too short!
Recommended Practices in R Object-Oriented Programming
The S4 OOP system has a rich set of powerful features. As you know, with great power comes great responsibility – so this section brings you a recommended set of practices for using S4 classes.
There are multiple sources of recommended practices when it comes to S4 including:
- R’s built-in documentation
- The S4 chapter of the Advanced R book
- S4 classes and methods (from Bioconductor learning materials)
There are cases of conflicting recommendations, for example in ?setClass
we can read that the prototype
argument is unlikely to be useful. While in Advanced R it is considered as bad advice and says that the prototype
parameter should always be provided.
Here, we will summarize the recommended practices along with their sources:
- New S4 generics should by convention use
lowerCamelCase
(Advanced R)
- S4 classes should by convention use
UpperCamelCase
(Advanced R)
- Consider defining (S4 classes and methods)
-
- Validity methods with
setValidity
for your classes show
methods for your classes- A constructor function named as the class that is documented and user-friendlysome text
- Coercion methods
- Additional methods depending on the shape of the object. For example, adding a
length()
method for vector-like objects
- Validity methods with
- Slots of a class should be considered as internal implementation details and should not be used directly using
@
outside of methods. To allow users to access values in those slots provide getters and if the objects are intended to be modified provide setters as well (both Advanced R and S4 classes and methods) - Keep method dispatch as simple as possible – avoid multiple inheritance and use multiple dispatch only when absolutely necessary (Advanced R)
If you don’t know which recommended practices to follow, consider your context. For example, if you are developing a package you want to publish to Bioconductor, then consider following practices recommended by Bioconductor.
S4 Usage in the Community
The S4 OOP system has seen large adoption in Bioconductor. In BioC 2.6, 51% of Bioconductor packages defined S4 classes (source).
It has also been used in other packages outside of Bioconductor, such as:
- Matrix – in version version 1.3.2 it defines 102 classes, 21 generic functions, and 2005 methods. (source)
- Rcpp.
- DBI – in the History of DBI article we can learn that at some point it used S3 classes and later on converted to S4 classes.
Summing up Object Oriented Programming in R – Part 3
- The S4 OOP system was developed by Bell Labs and introduced into the S language in the late 1990s.
- S4 is more formal and rigorous compared to S3 as it allows defining the types of class slots as well as validators.
- S4 offers additional features compared to S3 classes, such as virtual classes, multiple dispatch, multiple inheritance, class unions, and coercion. These powerful features give us new ways of solving problems in code.
- Currently, there are multiple sources of recommended practices for the S4 system. There are cases when different sources give conflicting recommendations.
- Because of the rich set of features offered by the S4 system, it has a higher learning curve compared to the S3 system.
- S4 features are powerful but should be used carefully. For example, combining multiple inheritance along with multiple dispatch can lead to situations where it might be hard to reason which method will be called for which combination of inputs.
- S4 classes are used extensively in Bioconductor as well as in some well-known packages in the R community such as Matrix, Rcpp, and DBI.
Advance your R coding techniques: Embrace functional programming for enhanced code efficiency and maintainability with our guide.
The post appeared first on appsilon.com/blog/.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.