Learning Scrabble strategy from robots, using R

Posted on March 30, 2017 by David Smith in R bloggers | 0 Comments

[This article was first published on Revolutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

While you might think of Scrabble as that game you play with your grandparents on a rainy Sunday, some people take it very seriously. There's an international competition devoted to Scrabble, and no end of guides and strategies for competitive play. James Curley, a psychology professor at Columbia University, has used an interesting method to collect data about what plays are most effective in Scrabble: by having robots play against each other, thousands of times.

The data were generated with a Visual Basic script that automated two AI players completing a game in Quackle. Quackle emulates the Scrabble board, and provides a number of AI players; the simulation used the “Speedy Player” AI which tends to make tactical scoring moves while missing some longer-term strategic plays (like most reasonably skilled Scrabble players). He recorded the results of 2566 games between two such computer players and provided the resulting plays and boards in an R package. With these data, you can see some interesting statistics on long-term outcomes from competitive Scrabble games, like this map (top-left) of which squares on the board are most used in games (darker means more frequently), and also for just the Q, Z and blank tiles. Scrabble games in general tend to follow the diagonals where the double-word score squares are located, while the high-scoring Q and Z tiles tend to be used on double- and triple-letter squares. The zero-point blank tile, by comparison, is used fairly uniformly across the board.

Further analysis of the actual plays during the simulated games reveals some interesting Scrabble statistics:

It's best to play first. Player 1 won 54.6% of games, while Player 2 won 44.9%, a statistically significant difference. (The remaining 0.4% of the games were ties.)

Some uncommon words are frequently used. The 10 most frequently-played words were QI, QAT, QIN, XI, OX, EUOI, XU, ZO, ZA, and EX; not necessarily words you'd use in casual conversation, but all words very familiar to competitive scrabble players. As we'll see later though, not all are high-scoring plays. Sometimes it's a good idea to get rid of a high-scoring but restrictive letter (QI, the life energy in Chinese philosophy), or simply to shake up a vowel-heavy rack for more opportunities next turn (EUOI, a cry of impassioned rapture in ancient Bacchic revels).

For high-scoring plays, go for the bingo. A Scrabble Bingo, where you lay down all 7 tiles in your rack in one play, comes with a 50-point bonus. The top three highest-scoring plays in the simulation were all bingo plays: REPIQUED (239 points), CZARISTS (230 points), and IDOLIZED. (Remember though, that this is from just a couple of thousand simulated games; there are many many more potentially high-scoring words.)

High-scoring non-bingo plays can be surprisingly long words. It's super-satisfying to lay down a short word like EX with the X making a second word on a triple-letter tile (for a 9x bonus), so I was surprised to see the top 10 highest-scoring non-bingo plays were still fairly long words: XENURINE (144 points), CYANOSES (126 pts) and SNAPWEED (126 points), all using at least 2 tiles already on the board. The shortest word in the top 10 of this list was ZITS.

Some tiles just don't work well together. The pain of getting a Q without a U seems obvious, but it turns out getting two U's is way worse in point-scoring potential. From the simulation, you can estimate the point-scoring potential of any pair of tiles in your rack: lighter is better, and darker is worse.

Managing the scoring potential of the tiles in your rack is a big part of Scrabble strategy, as we saw in another Scrabble analysis using R a few years ago. The lowly zero-point blank is actually worth a lot of potential points, while the highest-scoring tile Q is actually a liability. Here are the findings from that analysis:

The blank is worth about 30 points to a good player, mainly by making 50-point “bingo” plays possible.
Each S is worth about 10 points to the player who draws it.
The Q is a burden to whichever player receives it, effectively serving as a 5 point penalty for having to deal with it due to its effect in reducing bingo opportunities, needing either a U or a blank for a chance at a bingo and a 50-point bonus.
The J is essentially neutral pointwise.
The X and the Z are each worth about 3-5 extra points to the player who receives them. Their difficulty in playing in bingoes is mitigated by their usefulness in other short words.

For more James Curley's recent Scrabble analysis, including the R code using his scrabblr package, follow the link to below.

RPubs: Analyzing Scrabble Games

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Learning Scrabble strategy from robots, using R

Related

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)