A recipe for dynamic programming in R: Solving a “quadruel” case from PuzzlOR

[This article was first published on R – Serhat's Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

As also described in Cormen, et al (2009) p. 65, in algorithm design, divide-and-conquer paradigm incorporates a recursive approach in which the main problem is:

  • Divided into smaller sub-problems (divide),
  • The sub-problems are solved (conquer),
  • And the solutions to sub-problems are combined to solve the original and “bigger” problem (combine).

Instead of constructing indefinite number of nested loops destroying the readability of the code and the performance of execution, the “recursive” way utilizes just one block of code which calls itself (hence the term “recursive”) for the smaller problem. The main point is to define a “stop” rule, so that the function does not sink into an infinite recursion depth. While nested loops modify the same object (or address space in the low level sense), recursion moves the “stack pointer”, so each recursion depth uses a different part of the stack (a copy of the objects will be created for each recursion). This illustrates a well-known trade-off in algorithm design: Memory versus performance; recursion enhances performance at the expense of using more memory.

Given the benefits of the recursive divide-and-conquer paradigm, however, the performance of the naive recursion may not still be as good as we may want, since there may be too many smaller problems to navigate through recursion. Sometimes, the same or a similar sub-problem may be visited multiple times and each sub-problem paves way to its sub-problems down to the stop rule – so a naive recursive approach solves the same sub-problems many and many times. The recipe to enhance performance is to “memoize” solved sub-problems: Simply to keep solutions in a look-up table, and first check the table if the value was memoized before commencing with the “divide-and-conquer”. Divide-and-conquer paradigm on steroids with memoization is known as “dynamic programming” or “dp” for short (Cormen, et al. 2009, p. 359).

Now we will illustrate an application of the “dynamic programming” approach using a question from the  PuzzlOR site – a site which publishes bi-monthly decision support puzzles for applied mathematicians. Since the answers to expired questions are disclosed by the website itself, there is no problem in describing a solution in detail. The question that appeared in October 2014 is called “Fighters”. The question involves a quadruel: A kind of duel fight that is among four participants instead of the classical two participant case.

The question goes as follows:

Four different fighters are having an all-out battle to determine who among them is the strongest.  The image above shows those four fighters: Allan, Barry, Charles, and Dan. 
 
Each fighter has varying attack and health abilities.  At the start of the battle, they have differing health points: Allan has 10, Barry has 12, Charles has 16, and Dan has 18.  Also, each fighter has differing attack points: Allan has 4, Barry has 3, Charles has 2, and Dan has 1.
 
The battle takes place over multiple rounds, each round consisting of a single attack.  In each round, one random attacker and one random defender are chosen.  When the attacker attacks a defender, the defender loses health points in the amount equivalent to the attacker’s attack points.  For example, if Allan is the attacker and Barry is the defender, Barry would lose 4 health points.
 
The fighters continue to randomly attack and defend in subsequent rounds until there is only one fighter left, who is then declared the winner.  A fighter is removed from the battle when his life points become zero (or less).
 
 
Question:  Which fighter is most likely to win the battle?

 

Starting with the initial health levels of 10, 12, 16, and 18, we can divide the main problem of finding the most likely winner into smaller ones, each of which is the ex-post health level configuration after a possible and equally likely attack between any “alive” fighters – “alive” meaning having a non-zero health level. We can divide the problem until we have only one alive fighter – which is the winner. We calculate the actual probability of winning here in this deepest recursion step at the stop rule (only one winner and the probability is zero for other fighters), so this is the “conquer” phase in which the smaller problem is solved.

Each recursion paves way into smaller problems – each one involving a different pair of fighters – number of which is the count of two-member permutations of alive fighters without replacement .  We calculate permutations instead of combinations since Fighter X attacking Fighter Y is different than the other way around – the attacker injures the attacked one,  so the attack is one-way and not symmetric. We calculate the permutations without replacement, since an attacker cannot attack herself/himself! So in the case of four alive fighters, there are 12 possible attacks. And each attack has a probability of 1/12. The winning probabilities of each fighter calculated from each of these 12 attacks are summed together in a bottom-up manner (from the deepest recursion level involving the last alive fighter, up to the initial question) and this step is the “combine” phase: In each step involving n number of ordered pairs of fighters, the 1/n probability is multiplied with the bottom-up probabilities to allow for carrying the joint probabilities up to the topmost level (the initial question).

We first start with the divide-and-conquer paradigm without memoization and see the performance:


## for non-repeated permutations
library(gtools) 

### non memoized version

# get the winning probabilities of each player
# win means only one player remains with non-zero health

puzzlor.fighters.nm <- function() { 
    
    health <- c(10, 12, 16, 18) # initial health points of players a:d
    attack <- c(4, 3, 2, 1) # attack points of players a:d

    # permutations of players. first column attackers, second column attacked.
    # self attack not allowed (no repetitions)
    at.combs <- gtools::permutations(4, 2, 1:4) 

    # get matrix of results through ply
    results <- t(apply(at.combs, 1, fight.rec.nm, health, attack, 1/12))

    # sum all probabilities of each player
    result <- colSums(results)

    # update player names
    names(result) <- c("Allan", "Barry", "Charles", "Dan")
 
    return(result)

}
    

# recursive function to recalculate the win probabilities after an attack,
# incorporating bottom-up cumulative joint probabilities

fight.rec.nm <- function(at, health, attack, prob) { 

    # at: vector of attacker and attacked no's
    # at1: attacker no
    # at2: attacked no
    # health: vector for current health levels of all players
    # attack: vector for attack points of all players
    # prob: carried joint probability

    at1 <- at[1] # attacker no
    at2 <- at[2] # attacked no

    # update the health of at2 after attack
    health[at2] <- max(health[at2] - attack[at1], 0)

    players <- which(health > 0) # index of alive players

    # if only one player survives, so is the winner
    if (length(players) == 1) { 
        result <- rep(0,4) # vector for results
        result[players] <- prob # update the probability of the winner
        return(result)

    } else {
 
        # permutations of indices of attackers and attackeds.
        # self attack not allowed (no repetitions in permutations)
        at.combs <- gtools::permutations(length(players), 2, players)
 
        probs <- nrow(at.combs) # total count of attack permutations
        prob.new <- 1 / probs # probability of each attack permutation

        # collect matrix of results recursing the function
        results <- t(apply(at.combs, 1, fight.rec.nm, health, attack, prob.new)) 
        result <- colSums(results) # sum the probabilities for each player
        result <- prob * result # multiply with the joint probability
        return(result)

    }

}

The key points here in the above code are:

  • In each recursion,  the sub-problem is defined by the attacker, attacked, current health points, and the carried probability of that sub-problem (1 / the number of sub-problems generated one level up). The attack points of fighters do not change (line 35).
  • In each sub-problem the health points are updated by subtracting the attack point of the attacker from the health point of the attacked. Note that, the health level can be as low as 0 (line 48).
  • New sub-problems are defined for the non-repeated ordered-pairs of remaining alive fighters – provided that more than one fighter is alive. The permutations are generated by the use of “gtools” package, a handy package to generate combinations and permutations with different options. Note that the base function “expand.grid” generates only the repeated permutations which is not useful for our purpose, since a fighter is not supposed to attack herself/himself. The carried probability is 1 / the number of new sub-problems. The same updated health levels are carried to the new recursion level in each new sub-problem (lines 62-68). If only one fighter with non-zero health level remains, the recursion stops and the carried probability is recorded for the alive fighter (0 probability for the others) (lines 53-56).
  • Each sub-problem returns a 4-tuple vector of winning probabilities for each fighter.  The probabilities account for the carried probability (1 / number of ordered pairs). All results from sub-problems at a certain level return a matrix and the columns are summed up and multiplied by the carried probability to yield a vector of combined probabilities to be carried one level up until the original problem (lines 69-71).

We construct two functions: The first one (“puzzlor.fighters.nm”) (line 9) is a wrapper to initiate necessary objects and to start the initial problem calling the recursive function. The second function (“fight.rec.nm”) (line 35) is the recursive function which does all the steps described in detail above.

The performance of this non-memoized version is … well it is still running as at the time I am writing these sentences.

The reason for the performance bottleneck is the fact that too many sub-problems are generated until only one winner remains and the number of “tree branches” grows geometrically with each recursion (multiplied by the number of possible ordered attack pairs in each step). And those sub-problems are revisited many times since the fights can yield the same health level configuration at different instances . We can enhance the performance without redefining the problem but by augmenting it with an additional data structure:

  • Each sub-problem, after the attack is realized, is defined by the 4-tuple of health levels (line 48).
  • And each of these sub-problems yields a 4-tuple of the winning probabilities of each fighter (line 56, line 71).
  • So for each sub-problem defined by the 4-tuple configuration of health levels, we can memoize the 4-tuple winning probabilities of each fighter.
  • We must create an object to record the winning probabilities for each health level configuration. But what should be the R object type for this task? We can subset a matrix with at most two indices: row and column, but we need four indices for each health level. And in order to subset a vector (the 4-tuple probabilities), we can at most use one index in a matrix. The solution is to use a higher dimension object called “array” in R.  Arrays can have a dimension higher than two. In our example we need to have a 5D array to index for each health configuration and still yield a 1D vector object of probabilities
The new code which combines recursion and memoization and hence is an application of “dynamic programming” (dp) is as follows:

## for non-repeated permutations
library(gtools) 

# get the winning probabilities of each player
# win means only one player remains with non-zero health

puzzlor.fighters.5D <- function() {     

    health <- c(10, 12, 16, 18) # initial health points of players a:d
    attack <- c(4, 3, 2, 1) # attack points of players a:d

    ## global 5D object for memoization
    result.ar <<- array(dim = c(health + 1, 4), dimnames = NULL)

    # permutations of players. first column attackers, second column attacked.
    # self attack not allowed (no repetitions).
    at.combs <- gtools::permutations(4, 2, 1:4)
 
    # get matrix of results through ply
    results <- t(apply(at.combs, 1, fight.rec.5D, health, attack, 1/12)) 
    result <- colSums(results) # sum all probabilities of each player
    names(result) <- c("Allan", "Barry", "Charles", "Dan") # update player names
    return(result)

}
    

# recursive function to recalculate the win probabilities after an attack,
# incorporating bottom-up cumulative joint probabilities
fight.rec.5D <- function(at, health, attack, prob) { 

    # at: vector of attacker and attacked no's
    # at1: attacker no
    # at2: attacked no
    # health: vector for current health levels of all players
    # attack: vector for attack points of all players
    # prob: carried joint probability

    at1 <- at[1] # attacker no
    at2 <- at[2] # attacked no

    # update the health of at2 after attack
    health[at2] <- max(health[at2] - attack[at1], 0)

    # create a matrix for indexing the 5D array 
    mat.index <- cbind(matrix(rep(health + 1, 4), nrow = 4, byrow = T), 1:4)

    mat.row <- "["(result.ar, mat.index) # get the memoized health configuration

    # if the probabilities for the health conf. are already memoized
    if (!is.na(mat.row[1])) {

        # get the result and multiply with the prob
        result <- prob * mat.row 
        return(result)

    } else {

        players <- which(health > 0) # index of alive players

        # if only one player survives, so is the winner
        if (length(players) == 1) { # 

            result <- rep(0,4) # vector for results
            result[players] <- prob # update the probability of the winner
            return(result)

        } else {

            # permutations of indices of attackers and attackeds.
            # self attack not allowed (no repetitions in permutations)
            at.combs <- gtools::permutations(length(players), 2, players) 
            probs <- nrow(at.combs) # total count of attack permutations
            prob.new <- 1 / probs # probability of each attack permutation

            # collect matrix of results recursing the function
            results <- t(apply(at.combs, 1, fight.rec.5D, health, attack, prob.new))
            result <- colSums(results) # sum the probabilities for each player

            # memoize the health configuration
            "["(result.ar, mat.index) <<- result 

            # multiply with the joint probability
            result <- prob * result 
            return(result)

        }

    }

}

The main points of the modified code are as follows:

  • We create a 5D array, to hold the memoized values. The first four dimensions are for the health values and the last dimension is for the memoized probabilities. Note that, each dimension is 1 + the initial health in order to account for 0 health level. When subsetting the array, the health values will be incremented (line 13).
  • And also note that, the array is superassigned (“<<-“) so it is defined in the global environment. This is important, since whatever the recursion level is, the memoization should be done into one global object, not a copy of that object at the current depth – otherwise we cannot collect all results and look them up from the same collection.
  • We can subset a 5D object with 4 indices (leaving the last index so that a vector is returned) just like we subset a matrix for rows and/or columns: array.object[x,y,z,a,]. However, we should first subset each index from the health level configuration vector – which does not look very pretty considering higher dimension objects. A more elegant and generalized way to subset the 5D array is to use the subsetting operator “[” as a function: Note that every operator in R is a function following the convention of LISP (one of the languages that inspired S and R) (line 48). We should first create a matrix containing the index values and then use the “[” operator as a function (line 46). Note that we can follow the same subsetting approach for modifying the array for memoization purposes (line 81).
  • In each sub-problem, the array will be indexed first to get the memoized probabilities – if any. If the probabilities for the given health level configuration is already memoized, those values will be returned (after multiplying with the probability carried from one higher level) (lines 51-55).
  • For the last level of recursion (where there is only one fighter left), there is no need to memoize the value, since just returning a vector of three 0’s and one value for carried probability without further recursion should be as efficient as indexing the 5D array (lines 62-66).
  • For the cases in which the configuration is not yet memoized and there are more than one alive fighters, the recursion will be followed as in the first code (lines 72-85), however, this time the results will be memoized into the 5D array before multiplying with the carried probability (line 81).
Now let’s execute this new function and compare its results with the first one. Please note that the first code is still running after several hours on an octacore machine at 2.7 GHz current and 3.5 GHz max clock speed!
> microbenchmark(puzzlor.fighters.5D(), times = 1)
Unit: seconds
                  expr      min       lq     mean   median       uq      max
 puzzlor.fighters.5D() 7.983507 7.983507 7.983507 7.983507 7.983507 7.983507
 neval
     1

The code executes in less than 8 seconds. You can check the correct solutions from the PuzzlOR web page.

A downside to the dp approach is that, it cannot be utilized with the “mcmapply” and similar R functions of the “parallel” package for multicore parallelism, since these functions create copies of the global objects as well as the local objects for each thread. So all values cannot be memoized into the same object. However the gains from the memoization outweigh the disadvantage of using a single core.

You can view the whole code (with an additional option with a 4D array and a matrix, which slightly underperforms the 5D case) from my Github page.

Notes:

  • In cases where R returns an error for reaching the maximum recursion depth, the limit for recursion depth up to 5e5 levels can be defined with:
    > options(expressions= 500000)
    
  • Sometimes, despite an enough level of maximum recursion depth, the usage of stack may reach the maximum available size. To tackle with this issue answers to a stackoverflow question  summarize the solution. From the unix shell, to check the current stack size:
    $ ulimit -s # print default
    8192
    

    Now you can increase the stack size by:

    $ ulimit -s 16384 # enlarge stack limit to 16 megs
    

    And now, you can check the new stack size available to R by calling a new R session from the shell that you increased the stack size with and calling the following:

    
    > Cstack_info()["size"]
        size 
    15938355
    

To leave a comment for the author, please follow the link and comment on their blog: R – Serhat's Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)