Big Data and Chess: What are the Predictive Point Values of Chess Pieces?

Posted on June 10, 2015 by Rasmus Bååth in R bloggers | 0 Comments

[This article was first published on Publishable Stuff, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Who doesn’t like chess? Me! Sure, I like the idea of chess – intellectual masterminds battling each other using nothing but pure thought – the problem is that I tend to loose, probably because I don’t really know how to play well, and because I never practice. I do know one thing: How much the different pieces are worth, their point values:

This was among the first things I learned when I was taught chess by my father. Given these point values it should be as valuable to have a knight on the board as having three pawns, for example. So where do these values come from? The point values are not actually part of the rules for chess, but are rather just used as a guideline when trading pieces, and they seem to be based on the expert judgment of chess authorities. (According to the guardian of truth there are many alternative valuations, all in the same ballpark as above.) As I recently learned that it is very important to be able to write Big Data on your CV, I decided to see if these point values could be retrieved using zero expert judgement in favor of huge amounts of chess game data.

The method

How to allocate point values to the chess pieces using only data? One way of doing this is to calculate the predictive values of the chess pieces. That is, given that we only know the current number of pieces in a game of chess and use that information to predict the outcome of that game, how much does each type of piece contribute to the prediction? We need a model to predict the outcome of chess games where we have the following restrictions:

Each type of piece has a single point value that directly contributes to the predicted outcome of the game, so no interaction effects between the pieces.
The value of a piece does not change over the course of the game.
Use no context and nor positional information.

Now these restrictions might feel a bit restrictive, especially if we actually would want to predict the outcome of chess games as well as possible, but they come from that the original point values follow the same restrictions. As the original point values doesn’t change with context, neither should ours. Now, as my colleague Can Kabadayi (with an ELO well above 2000) remarked: “But context is everything in Chess!”. Absolutely, but I’m not trying to do anything profound here, this is just a fun exercise! 🙂

Given the restrictions there is one obvious model: Logistic regression, a vanilla statistical model that calculates the probability of a binary event, like a loss-win. To get it going I needed data and the biggest Big Data data set I could find was the Million Base 2.2 which contains over 2.2 million chess games. I had to do a fair deal of data munging to get it into a format that I could work with, but the final result was a table with a lot of rows that looked like this:

pawn_diff rook_diff knight_diff bishop_diff queen_diff white_win 
    1         0           1         -1           0       TRUE

Here each row is from a position in a game where a positive number means White has more of that piece. For the position above white has one more pawn and knight, but one less bishop than Black. Last in each row we get to know whether White won or lost in the end, as logistic regression assumes a binary outcome I discarded all games that ended in a draw. My résumé is unfortunately not going to look that good as I never really solved the Big Data problem well. Two million chess games are a lot of games and it took my poor old laptop over a day to process only the first 100,000 games. Then I had the classic Big Data problem that I couldn’t fit it all into working memory, so I simply threw away data until worked. Still, for the analysis I ended up using a sample of 1,000,000 chess positions from the first 100,000 games in the Million Base 2.2 . Big enough data for me.

The result

Using the statistical language R I first fitted the following logistic model using maximum likelihood (here described by R’s formula language):

white_win ~ 1 + pawn_diff + knight_diff + bishop_diff + rook_diff + queen_diff

Final notes

Again, I don’t know much about chess, but the the Million Base 2.2 is a fun database to work with, so if you have any suggestion for other things to look at, leave a comment below or tweet me (@rabaath)!

If you want to dabble with the database yourself you can find the scripts I used to convert the data in the Million Base into an analysis friendly format, and the code for recreating the predictive piece value analysis, here:

Python script to convert the Million Base 2.2 to a json-format:
- https://gist.github.com/rasmusab/07f1823cb4bd0bc7352d
R scripts that recreates the analysis and plots in this post:
- https://gist.github.com/rasmusab/fb98cced046d4c675d74
- https://gist.github.com/rasmusab/b29bb53cfc3fe25f3f80
- https://gist.github.com/rasmusab/b29bb53cfc3fe25f3f80

While “researching” this post I learned about this really fun chess variant called Knightmare Chess which plays like normal chess but with the addition that each player can play action cards with “special effects”. These effects are often spectacular, like the card Fireball that “explodes” a piece which kills all adjacent pieces. This add a (large) element of randomness to the game, which might irritate chess purist, but makes it possible for me to win once in a while 🙂

To leave a comment for the author, please follow the link and comment on their blog: Publishable Stuff.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Big Data and Chess: What are the Predictive Point Values of Chess Pieces?

The method

The result

Final notes

Related

The method

The result

Final notes

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)