**Byte Mining » R**, and kindly contributed to R-bloggers)

Whenever I tell people in my family that I study Statistics, one of the first questions I get from laypeople is “do you count cards?” A blank look comes over their face when I say “no.”

Look, if I am at a casino, I am well aware that the odds are against me, so why even try to think that I can use statistics to make money in this way? Although I love numbers and math, the stuff flows through my brain all day long (and night long), every day. If the goal is to enjoy and have fun, I do not want to sit there crunching probability formulas in my head (yes that’s fun, but it is also work). So that leaves me at the video Poker machines enjoying the free drinks. Another positive about video Poker is that $20 can sometimes last a few hours. So it should be no surprise that I do not agree with using Poker to teach probability. Poker is an extremely superficial way to introduce such a powerful tool and gives the impression that probability is a way to make a quick buck, rather than as an important tool in science and society. The only time that I have used Poker in teaching (besides when required), is to cover the hypergeometric distribution and sampling without replacement.

Since I took Intro Probability Theory, I have always wondered what to do in the following situation. Say a pair of cruddy low cards appear on the first draw. The game only awards money for pairs of jacks or better. If all I have in the hand is a pair of low cards and no face cards, my decision is easy: hold the pair of low cards. But what if there is at least one face card showing (no other pairs)? Pictorially this looks like

The conundrum:

- Hold the two low cards and deal, hoping for a three of a kind, or
- Hold the two low cards AND one of the face cards, hoping for a three of a kind, OR a pair of Jacks of Better.

Under each of these decisions, which yields the highest probability of winning *something* and which one yields the highest payout? This problem can be solved exactly by using combinatorics, conditional probability and expectation, but since a video poker game is basically a simulator (though likely biased), I wrote my own simulation. **For the answer, scroll to the end!**

**Data Structure**

In most card games, we would want to store the state of the game: the outstanding cards in the deck(s), and the hand(s) of each player. In standard video poker, there is one deck, and one player, so only the player hand needs to be recorded because every card in the deck is either in the hand, or it is not. One obvious way to represent the hand is as an array of denomination/suit tuples in an array. Unfortunately, this data structure requires other data structures to store the possible suits, and possible denominations. It is also more tedious to detect certain kinds of wins. For this simulation, I use a 13 x 4 matrix where each row is a different denomination, and each column is each of the four suits. This matrix allows us to easily see which cards are possible to be dealt. Additionally, this matrix, as well as vector-based languages such as R, make it easy to detect wins. Such a matrix looks like the following for the hand **2****♠ 5♣ 8♥ 8♣ A♦**

*Cij*denotes a card,

*i*is the denomination and

*j*is the suit and

*H*is the player’s hand in question.

**Detecting Wins**

Poker wins are not disjoint. A three of a kind involving Jacks is also a pair of Jacks or better, etc. When checking wins, I start with the lowest paying win, and move up to Royal Flush, only keeping track of the highest win. Thus, this algorithm detects a four-of-a-kind involving Queens as Jacks or Better, two pairs of Queens, and a three-of-a-kind of Queens, but only counts it as the highest win, the four-of-a-kind.

*Pair of Jacks or Better*: a pair of Jacks, Queens, Kings or Aces. In**A**, this is simply the condition that at least one row in rows 10 through 13 has a row sum greater than 1.*Two pair*: two pairs of anything. In**A**, this is the condition that at least two rows have a sum greater than 1.*Three of a kind*: three of any card. In**A**, this is the condition that at least one row has a sum of at least 3.*Straight*: all 5 cards can be permuted such that they form an ascending sequence: A, 2, 3, 4, 5, 6, 7, 8, 9, 10, J, Q, K, A. This case is interesting and will be discussed in a bit.*Flush*: all 5 cards are of the same suit. In**A**, this is the condition that at least one column has a sum of at least 5.*Full House*: one three-of-a-kind, and a pair of anything. In**A**, this is the condition that a row has sum 3, and another row has sum 2.*Four of a Kind*: 4 of any card. In**A**, this is the condition that a row has sum 4.*Straight Flush*: the 5 cards can be permuted to form an ascending sequence and are all of the same suit. In**A**, this is simply the condition that we have a straight and a flush in the same hand.*Royal Flush*: a straight flush with the Ace as the high card. In**A**, this is simply the condition that we have a straight flush AND the sum of row 13 is 1.

*Detecting the Straight:*In

**A**, we have a straight when five successive rows have sum equal to 1. We can do this iteratively, but there is a better way. Note that if all of the row sums are 0 or 1, we can treat the vector of row sums as a binary number and convert it to its integer representation. Each binary number has 13 bits. If we let 2 be the zeroth power, then straights will lead to the following binary and integer representations:

**Bug alert:** It just occurred to me that there are many more wrap-around straights such as

*n*successive powers of 2 is divisible by . After some experimentation I came up with the following rule: if all of the row sums are 0/1 and the integer representation of this binary vector is divisible by , then

**A**is a straight. The only straight that does not fit this pattern is the wrap-around straight: J, Q, K, A, 2 which can be checked manually.

**The Algorithm**

- Randomly generate a hand containing a pair of low cards (2-10) and at least one face card.
- Hold the pair of low cards. Under strategy 2, hold one (and only one) of the face cards.
- Discard the unheld cards from the deck and draw 2 or 3 cards at random from the same deck.
- Check for wins.
- Increment a win counter.
- Repeat steps 1-5 tons of times, recording the percentage of hands that yielded a win, of the
*n*games/hands played.

**Results: Hold the Pair of Low Cards Only**

My usual strategy is to always hold the low pair and take one face card along for the ride. That way, I hopefully match one of the two denominations I hold. My parents on the other hand, always told me to hold the low pair only, because that gives one more card (degree of freedom) for a win. It turns out they were right. Each game consisted of 1,000 hands. A percentage of these hands yields a win. This percentage is a random variable, so I ran this simulation to play 1,000 games. The table below shows the distribution of the win percentages.

*Note that under strategy 1 (hold low pair only), all wins are more likely than under strategy 2! *Of course, the estimate in the last column is an average; the mean in this case. The plot below shows the distribution of win percentages for both strategies.

**The Code**

The code for my simulation is below. Note that it can easily be modified for your own target hands of interest. In my simulation, certain functions were never used because certain winning hands were not possible.

**DISCLAIMER: **I did this for fun, and it is possible that there are bugs or problems with my code, algorithm or simulation. The results seem correct because I empirically I seem to do about the same using either strategy, and in a gambling perspective, an 8% discrepancy is not likely to set off bells in the head.

**leave a comment**for the author, please follow the link and comment on his blog:

**Byte Mining » R**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...