The Twitter Waterflow Problem
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Nathan Eastwood, Data Scientist
I was recently introduced to the Twitter Waterflow Problem and I decided it was interesting enough to try and complete the challenge in R.
Consider the following picture:
This plot shows a series of walls and empty valleys. We can represent this picture by an array of integers, where the value at each index is the height of the wall. So in this case, the array of integers can be defined as:
wall <- c(2, 5, 1, 2, 3, 4, 7, 7, 6)
Now imagine it rains. How much water is going to be accumulated in puddles between walls? No puddles are formed at edges of the wall and water is considered to simply run off the edge. We count volume in square blocks of 1×1. Thus, we are left with a puddle between column 2 and column 7 and the volume is 10.
The Approach
The approach I took was one of many ways you could solve this problem. I chose to treat each column (wall) as an index and then for each index I implement a loop:
- Find the height of the current index
- Find the maximum height of the walls to the left of the current index
- Find the maximum height of the walls to the right of the index
I then work out what the smallest maximum height is between the maximum heights to the left and right of the current index. If this smallest height minus the current index height is greater than zero, then I know how many blocks will fill with water for the current index. Of course, if the smallest maximum height to the left or right is less than the current height, then I get the run off. Converting this to code looks like this:
len <- length(wall) # pre-allocate memory to make the loop more efficient water <- rep(0, len) for (i in seq_along(wall)) { currentHeight <- wall[i] maxLeftHeight <- if (i > 1) { max(wall[1:(i - 1)]) } else { 0 } maxRightHeight <- if (i == len) { 0 } else { max(wall[(i + 1):len]) } smallestMaxHeight <- min(maxLeftHeight, maxRightHeight) water[i] <- if (smallestMaxHeight - currentHeight > 0) { smallestMaxHeight - currentHeight } else { 0 } } water [1] 0 0 4 3 2 1 0 0 0
This returns a vector of the number of blocks of water at each index.
The R6 Class
For this problem I chose to use the R6
class system because it is very self contained. The R6
class system is different from the functional S3
and S4
class systems found in base R in that it is an encapsulated class system. Some key differences between the two are:
Functional:
- Objects contain data
- Class methods are separate from objects
- Objects are immutable - they cannot be changed after they have been created
Encapsulated:
- Objects contain methods and data
- Objects are mutable
Winston Chang details these differences very well in his userR!2017 talk.
I wanted the user to be able to acquire two key pieces of information: the total blocks of water and the filled plot. I therefore created three public
methods inside an R6
class and placed the class in a simple package which can be found here. public
methods are a list of functions (and/or non-functions) which are essentially methods (or data) of the class that are intended to be used by the user of the class.
When writing a new R6
class, we often want to perform some initial functionality when we instantiate an object of this class. This is true in our case as when we instantiate a new object of class waterflow
, we want it to calculate the heights of the water straight away. We therefore call our earlier function initialize
and place it inside the public
list.
initialize = function(wall = NULL) { if (is.null(wall)) { stop("Please provide some wall heights") } if (!is.numeric(wall)) { stop("Please provide a numeric vector") } len <- length(wall) water <- rep(0, len) for (i in seq_along(wall)) { currentHeight <- wall[i] maxLeftHeight <- if (i > 1) { max(wall[1:(i - 1)]) } else { 0 } maxRightHeight <- if (i == len) { 0 } else { max(wall[(i + 1):len]) } smallestMaxHeight <- min(maxLeftHeight, maxRightHeight) water[i] <- if (smallestMaxHeight - currentHeight > 0) { smallestMaxHeight - currentHeight } else { 0 } } private$heights <- private$tidyWater(water, wall) }
Note: we have one key difference here, we are assigning the results to members of the private
list within our object. Once we have calculated our water
heights, we use them along with our wall
heights in the function private$tidyWater()
and assign the resulting data.frame
to private$heights
. The private
argument is a list which contains methods (and/or data) which are internal to the class and are not intended to be used by the user. We do similar things when writing packages - we explicitly export the functionality we want other people to use and don’t export functionality that is only used within the package itself.
So to use the class, first we instantiate the class with some data in a call to $new()
which in turn runs the above initialize
function.
library(waterflow) wall <- c(2, 5, 1, 2, 3, 4, 7, 7, 6) p <- waterflow$new(wall)
So now we have an object called p
which is of class waterflow
(and R6
). p
contains the data, as well as the (public
) methods we can perform on that data.
Typically when we return p
, R6
objects have a default print
method that lists all members of the object but here there is a custom $print()
function that returns heights
.
p pos type val 1 1 water 0 2 2 water 0 3 3 water 4 4 4 water 3 5 5 water 2 6 6 water 1 7 7 water 0 8 8 water 0 9 9 water 0 10 1 wall 2 11 2 wall 5 12 3 wall 1 13 4 wall 2 14 5 wall 3 15 6 wall 4 16 7 wall 7 17 8 wall 7 18 9 wall 6
We can still return the members of the object by looking at the structure of p
.
str(p) Classes 'waterflow', 'R6' <waterflow> Public: clone: function (deep = FALSE) initialize: function (wall = NULL) plot: function () print: function () total: function () Private: heights: data.frame tidyWater: function (water, wall)
To calculate the total water that fills the valleys, we sum over the heights
object for the values of the water
.
total = function() { sum(private$heights[private$heights$type %in% "water", "val"]) }
Calling the public
method $total()
gives the expected result.
p$total() [1] 10
To call the plot, we access the public
method $plot()
.
p$plot()
The completed class looks like this
waterflow <- R6Class( "waterflow", public = list( initialize = function(wall = NULL) { if (is.null(wall)) { stop("Please provide some wall heights") } if (!is.numeric(wall)) { stop("Please provide a numeric vector") } len <- length(wall) water <- rep(0, len) for (i in seq_along(wall)) { currentHeight <- wall[i] maxLeftHeight <- if (i > 1) { max(wall[1:(i - 1)]) } else { 0 } maxRightHeight <- if (i == len) { 0 } else { max(wall[(i + 1):len]) } smallestMaxHeight <- min(maxLeftHeight, maxRightHeight) water[i] <- if (smallestMaxHeight - currentHeight > 0) { smallestMaxHeight - currentHeight } else { 0 } } private$heights <- private$tidyWater(water, wall) }, plot = function() { ggplot(private$heights) + geom_col( aes(x = pos + 1 / 2, y = val, fill = type), width = 1, show.legend = FALSE ) + scale_fill_manual(values = c("dodgerblue2", "grey50")) + scale_x_continuous(breaks = seq(0, max(private$heights$pos), 1)) + theme( panel.background = element_blank(), panel.ontop = TRUE, panel.grid.minor.x = element_blank(), panel.grid.minor.y = element_line(colour = "white", size = 0.5), panel.grid.major.x = element_line(colour = "white", size = 0.5), panel.grid.major.y = element_line(colour = "white", size = 0.5), axis.ticks = element_blank(), text = element_blank() ) }, print = function() print(private$heights), total = function() { sum(private$heights[private$heights$type %in% "water", "val"]) } ), private = list( heights = NULL, tidyWater = function(water, wall) { data.frame( pos = seq_along(wall), type = factor( rep(c("water", "wall"), each = length(wall)), levels = c("water", "wall") ), val = c(water, wall) ) } ) )
Disadvantages of R6
Overall R6
is a great package with one or two downsides but these are mostly not too worrying. For one thing, R6
is a separate package that users have to load or use as an import but given that it is such a lightweight package and doesn’t have any imports of its own, it’s nothing to be concerned about. Also, R6
doesn’t have any strict type checking but this is remedied by including your own type checking in your methods. The main bug bear I have with R6
is the lack of support from Roxygen2
out of the box and this has been an open issue for a while.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.