# Calculating the house edge of a slot machine, with R

**Revolutions**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Modern slot machines (fruit machine, pokies, or whatever those electronic gambling devices are called in your part of the world) are designed to be addictive. They're also usually quite complicated, with a bunch of features that affect the payout of a spin: multiple symbols with different pay scales, wildcards, scatter symbols, free spins, jackpots … the list goes on. Many machines also let you play multiple combinations at the same time (20 lines, or 80, or even more with just one spin). All of this complexity is designed to make it hard for you, the player, to judge the real odds of success. But rest assured: in the long run, you always lose.

All slot machines are designed to have a “house edge” — the percentage of player bets retained by the machine in the long run — greater than zero. Some may take 1% of each bet (over a long-run average); some may take as much as 15%. But every slot machine takes *something*.

That being said, with all those complex rules and features, trying to calculate the house edge, even when you know all of the underlying probabilities and frequencies, is no easy task. Giora Simchoni demonstrates the problem with an R script to calculate the house edge of an “open source” slot machine *Atkins Diet*. Click the image below to try it out.

This virtual machine is at a typical level of complexity of modern slot machines. Even though we know the pay table (which is always public) and the relative frequency of the symbols on the reels (which usually isn't), calculating the house edge for this machine requires several pages of code. You could calculate the expected return analytically, of course, but it turns out to be a somewhat error-prone combinatorial problem. The simplest approach is to simulate playing the machine 100,000 times or so. Then we can have a look at the distribution of the payouts over all of these spins:

The *x* axis here is log(Total Wins + 1), in log-dollars, from a single spin. It's interesting to see the impact of the bet size (which increases variance but doesn't change the distribution), and the number of lines played. Playing one 20-line game isn't the same as playing 20 1-line games, because the re-use of the symbols means multi-line wins are *not* independent: a high-value symbol (like a wild) may contribute to wins on multiple lines. Conversely, losing combinations have a tendency to cluster together, too. It all balances in the end, but the possibility of more frequent wins (coupled with higher-value losses) is apparently appealing to players, since many machines encourage multi-line play.

Nonetheless, whichever method you play, the house edge is always positive. For *Atkins Diet*, it's about 4% for single-line play, and about 3.2% for multi-line play. You can see the details of the calculation, and the complete R code behind it, at the link below.

Giora Simchoni: Don't Drink and Gamble (via the author)

**leave a comment**for the author, please follow the link and comment on their blog:

**Revolutions**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.