The Twitter Waterflow Problem

[This article was first published on Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Nathan Eastwood, Data Scientist

I was recently introduced to the Twitter Waterflow Problem and I decided it was interesting enough to try and complete the challenge in R.

Consider the following picture:

This plot shows a series of walls and empty valleys. We can represent this picture by an array of integers, where the value at each index is the height of the wall. So in this case, the array of integers can be defined as:

wall <- c(2, 5, 1, 2, 3, 4, 7, 7, 6)

Now imagine it rains. How much water is going to be accumulated in puddles between walls? No puddles are formed at edges of the wall and water is considered to simply run off the edge. We count volume in square blocks of 1×1. Thus, we are left with a puddle between column 2 and column 7 and the volume is 10.

The Approach

The approach I took was one of many ways you could solve this problem. I chose to treat each column (wall) as an index and then for each index I implement a loop:

  • Find the height of the current index
  • Find the maximum height of the walls to the left of the current index
  • Find the maximum height of the walls to the right of the index

I then work out what the smallest maximum height is between the maximum heights to the left and right of the current index. If this smallest height minus the current index height is greater than zero, then I know how many blocks will fill with water for the current index. Of course, if the smallest maximum height to the left or right is less than the current height, then I get the run off. Converting this to code looks like this:

len <- length(wall)
# pre-allocate memory to make the loop more efficient
water <- rep(0, len)
for (i in seq_along(wall)) {
  currentHeight <- wall[i]
  maxLeftHeight <- if (i > 1) {
    max(wall[1:(i - 1)])
  } else {
    0
  }
  maxRightHeight <- if (i == len) {
    0
  } else {
    max(wall[(i + 1):len])
  }
  smallestMaxHeight <- min(maxLeftHeight, maxRightHeight)
  water[i] <- if (smallestMaxHeight - currentHeight > 0) {
    smallestMaxHeight - currentHeight
  } else {
    0
  }
}
water
[1] 0 0 4 3 2 1 0 0 0

This returns a vector of the number of blocks of water at each index.

The R6 Class

For this problem I chose to use the R6 class system because it is very self contained. The R6 class system is different from the functional S3 and S4 class systems found in base R in that it is an encapsulated class system. Some key differences between the two are: Functional:

  • Objects contain data
  • Class methods are separate from objects
  • Objects are immutable - they cannot be changed after they have been created

Encapsulated:

  • Objects contain methods and data
  • Objects are mutable

Winston Chang details these differences very well in his userR!2017 talk.

I wanted the user to be able to acquire two key pieces of information: the total blocks of water and the filled plot. I therefore created three public methods inside an R6 class and placed the class in a simple package which can be found here. public methods are a list of functions (and/or non-functions) which are essentially methods (or data) of the class that are intended to be used by the user of the class.

When writing a new R6 class, we often want to perform some initial functionality when we instantiate an object of this class. This is true in our case as when we instantiate a new object of class waterflow, we want it to calculate the heights of the water straight away. We therefore call our earlier function initialize and place it inside the public list.

initialize = function(wall = NULL) {
  if (is.null(wall)) {
    stop("Please provide some wall heights")
  }
  if (!is.numeric(wall)) {
    stop("Please provide a numeric vector")
  }
  len <- length(wall)
  water <- rep(0, len)
  for (i in seq_along(wall)) {
    currentHeight <- wall[i]
    maxLeftHeight <- if (i > 1) {
      max(wall[1:(i - 1)])
    } else {
      0
    }
    maxRightHeight <- if (i == len) {
      0
    } else {
      max(wall[(i + 1):len])
    }
    smallestMaxHeight <- min(maxLeftHeight, maxRightHeight)
    water[i] <- if (smallestMaxHeight - currentHeight > 0) {
      smallestMaxHeight - currentHeight
    } else {
      0
    }
  }
  private$heights <- private$tidyWater(water, wall)
}

Note: we have one key difference here, we are assigning the results to members of the private list within our object. Once we have calculated our water heights, we use them along with our wall heights in the function private$tidyWater() and assign the resulting data.frame to private$heights. The private argument is a list which contains methods (and/or data) which are internal to the class and are not intended to be used by the user. We do similar things when writing packages - we explicitly export the functionality we want other people to use and don’t export functionality that is only used within the package itself.

So to use the class, first we instantiate the class with some data in a call to $new() which in turn runs the above initialize function.

library(waterflow)
wall <- c(2, 5, 1, 2, 3, 4, 7, 7, 6)
p <- waterflow$new(wall)

So now we have an object called p which is of class waterflow (and R6). p contains the data, as well as the (public) methods we can perform on that data.

Typically when we return p, R6 objects have a default print method that lists all members of the object but here there is a custom $print() function that returns heights.

p
   pos  type val
1    1 water   0
2    2 water   0
3    3 water   4
4    4 water   3
5    5 water   2
6    6 water   1
7    7 water   0
8    8 water   0
9    9 water   0
10   1  wall   2
11   2  wall   5
12   3  wall   1
13   4  wall   2
14   5  wall   3
15   6  wall   4
16   7  wall   7
17   8  wall   7
18   9  wall   6

We can still return the members of the object by looking at the structure of p.

str(p)
Classes 'waterflow', 'R6' <waterflow>
  Public:
    clone: function (deep = FALSE) 
    initialize: function (wall = NULL) 
    plot: function () 
    print: function () 
    total: function () 
  Private:
    heights: data.frame
    tidyWater: function (water, wall)  

To calculate the total water that fills the valleys, we sum over the heights object for the values of the water.

total = function() {
  sum(private$heights[private$heights$type %in% "water", "val"])
}

Calling the public method $total() gives the expected result.

p$total()
[1] 10

To call the plot, we access the public method $plot().

p$plot()

The completed class looks like this

waterflow <- R6Class(
  "waterflow",
  public = list(
    initialize = function(wall = NULL) {
      if (is.null(wall)) {
        stop("Please provide some wall heights")
      }
      if (!is.numeric(wall)) {
        stop("Please provide a numeric vector")
      }
      len <- length(wall)
      water <- rep(0, len)
      for (i in seq_along(wall)) {
        currentHeight <- wall[i]
        maxLeftHeight <- if (i > 1) {
          max(wall[1:(i - 1)])
        } else {
          0
        }
        maxRightHeight <- if (i == len) {
          0
        } else {
          max(wall[(i + 1):len])
        }
        smallestMaxHeight <- min(maxLeftHeight, maxRightHeight)
        water[i] <- if (smallestMaxHeight - currentHeight > 0) {
          smallestMaxHeight - currentHeight
        } else {
          0
        }
      }
      private$heights <- private$tidyWater(water, wall)
    },
    plot = function() {
      ggplot(private$heights) +
        geom_col(
          aes(x = pos + 1 / 2, y = val, fill = type),
          width = 1, show.legend = FALSE
        ) +
        scale_fill_manual(values = c("dodgerblue2", "grey50")) +
        scale_x_continuous(breaks = seq(0, max(private$heights$pos), 1)) +
        theme(
          panel.background = element_blank(),
          panel.ontop = TRUE,
          panel.grid.minor.x = element_blank(),
          panel.grid.minor.y = element_line(colour = "white", size = 0.5),
          panel.grid.major.x = element_line(colour = "white", size = 0.5),
          panel.grid.major.y = element_line(colour = "white", size = 0.5),
          axis.ticks = element_blank(),
          text = element_blank()
        )
    },
    print = function() print(private$heights),
    total = function() {
      sum(private$heights[private$heights$type %in% "water", "val"])
    }
  ),
  private = list(
    heights = NULL,
    tidyWater = function(water, wall) {
      data.frame(
        pos = seq_along(wall),
        type = factor(
          rep(c("water", "wall"), each = length(wall)),
          levels = c("water", "wall")
        ),
        val = c(water, wall)
      )
    }
  )
)

Disadvantages of R6

Overall R6 is a great package with one or two downsides but these are mostly not too worrying. For one thing, R6 is a separate package that users have to load or use as an import but given that it is such a lightweight package and doesn’t have any imports of its own, it’s nothing to be concerned about. Also, R6 doesn’t have any strict type checking but this is remedied by including your own type checking in your methods. The main bug bear I have with R6 is the lack of support from Roxygen2 out of the box and this has been an open issue for a while.

To leave a comment for the author, please follow the link and comment on their blog: Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)