While I was watching the Queen’s Gambit on Netflix, some questions came to my mind. What would happen if I make a chess engine play 1,000 matches against itself? Would the exact same game repeat over and over? If not, what would the win/lose distribution look like?

Chess engines are computer programs that analyze chess positions and provide a list of the strongest plays for those specific positions. These programs can be very strong and are regularly capable of beating the best human Grand Masters.

As for the win distribution part of my question, this article regarding the advantage of moving first in chess was interesting and informative. Let’s take a look at the following historical data:

Win distribution

In chess, white plays first and, as you can see, the initiative advantage allows white to win between 6% and 10% more games than black. In case you are wondering how is the score calculated, it’s simple: wins award 1 point and draws award 0.5 points (there are some slight innacuracies due to rounding, but I wanted to present the original table).

In a chess engine vs chess engine scenario, I was expecting white to have advantage (it’s chess after all), but I was still curious because machines play differently, are not subject to fatigue, stress, etc. Specifically, I was wondering if I would see more draws for example. Besides that, I still wanted to see if the same game would repeat over an over (the strongest play should always be the strongest play, right?), so I decided to move forward and perform a little experiment.

For this simulation I used the Stockfish chess engine because it’s available on all platforms, well known and best of all, free and open source.

Source code overview

Engine initialization

Python has some nice chess packages and after testing a few, I decided to go with python-chess. What I liked about it is that it can use Stockfish as a backend and it has all sort of convenience methods to evaluate the board state, such as detecting if there’s checkmate, if it’s impossible to win due to insufficient pieces, etc.

The complete source code is available, but I’ll explain some simplified highlights here.

Initializing Stockfish is quite easy:

logging.basicConfig(level=logging.DEBUG)
engine = chess.engine.SimpleEngine.popen_uci("D:/Users/frozen/Documents/99_temp/stockfish_12/stockfish.exe")
engine.configure({"Threads": 6})
engine.configure({"Hash": 4096})
engine.configure({"SyzygyPath": "D:/Users/frozen/Documents/99_temp/chess/syzygy"})

The hash table is used to store evaluations/information for a position, so that evaluating it later is faster and more efficient. This value is in megabytes. From what I’ve read, the suggestion is to use half of your RAM.

And what about that Syzygy stuff? This parameter is related to what is known as endgame tablebases. In a nutshell, there are endgame board states which have been deeply analyzed and studied, and the optimal moves are already known. In this case, we are going to use such tables to optimize moves. In case you want to do something similar, keep in mind that the parameter fails silently, so it’s a good idea to setup logging to make sure the files have been loaded. You’ll see something like this in the log:

DEBUG:chess.engine:<UciProtocol (pid=19816)>: >> info string Found 145 tablebases

Endgame tablebases are expensive in storage terms so I can’t distribute them in the source code repository. The ones I used for the simulation are available here. I used “Syzygy 3-4-5 Individual Files” which almost a 1 GB download. The “3-4-5” part of the filename means that the tablebases include endings with 3, 4 and 5 pieces. If you have 48 GB to spare, you can use the 6 pieces one if you prefer. 😉

Stockfish has many other parameters that you can change, but the default values are ok.

Simulation loop

The main simulation loop is also quite simple:

results = {"1-0": 0, "0-1": 0, "1/2-1/2": 0, "*": 0 }
moves = []

for match in range(0, matches):
    board = chess.Board()

    while not board.is_game_over():
        result = engine.play(board, chess.engine.Limit(depth=20), ponder=False)
        board.push(result.move)
        moves.append(result.move)

    results[board.result()] += 1

There are two important parameters in chess.engine.Limit:

  • “Depth” is related to how many moves the chess engine is going to look ahead.
  • “Ponder” tells the engine to keep thinking when it is waiting for the other player to make a move. This can be useful to optimize time if you are playing against the engine, but it should be disabled for engine vs engine matches.

Notice that board.result() returns “1-0” when white wins, “1/2-1/2” means draw and “0-1” when black win. However, it does not return what was the reason for a draw. I wanted to log matches, so there was a little bit of extra work to get that information.

def save_match(filename, board, moves):
    with open(f"{filename}.log", "w") as game_file:

        if board.is_checkmate():
            if board.turn == chess.WHITE:
                game_file.write("Black checkmate\n")
            else:
                game_file.write("White checkmate\n")
        else:
            if board.is_stalemate():
                game_file.write("Draw - Stalemate\n")
            else:
                if board.is_insufficient_material():
                    game_file.write("Draw - Insufficient material\n")
                else:
                    if board.is_fivefold_repetition():
                        game_file.write("Draw - Fivefold repetition\n")
                    else:
                        if board.is_seventyfive_moves():
                            game_file.write("Draw - Seventyfive Moves\n")
                        else:
                            game_file.write("Draw - Other\n")

        game_file.write(f"\n{str(board)}\n\n")

        for i, move in enumerate(moves):
            if i % 2 == 0:
                game_file.write(f"White: {str(move)}\n")
            else:
                game_file.write(f"Black: {str(move)}\n")

This method produces something like this:

Draw - Fivefold repetition

r . . q . r k .
. . . . . . . p
. . . p R . . .
p P p P n . p .
P . . b . p . .
. . . . . P . P
B . . B . . P K
. R . Q . . . .

White: e2e4
Black: e7e5
White: g1f3
Black: b8c6
White: f1b5
Black: a7a6
...

As for the board, in the text representation, uppercase is for white pieces and lowercase for black pieces. However, it’s difficult to understand, so we can program some additional functionality in order to have a nice looking board. Python-chess can only output boards in SVG format, but we can use the svglib library to convert to PNG format.

    boardsvg = chess.svg.board(board=board)
    with open(f"{filename}.svg", "w") as image_file:
        image_file.write(boardsvg)

    svg_image = svg2rlg(f"{filename}.svg")
    renderPM.drawToFile(svg_image, f"{filename}.png", fmt="PNG")
    os.remove(f"{filename}.svg")

Nicer board

That’s much nicer!!!

Running the simulation for first time

The first important thing I’ve noticed is that there was variety in how the games were played. In theory, given infinite time and the exact same conditions, the engine should always play the same moves. In practice, however, there are some reasons why this does not happen:

  • The simulation is being run on a CPU shared environment and even a slight difference in CPU allocation can lead to a different analysis path. When running multiple threads this effect also becomes more significant.
  • RAM swaps and other events can also make a difference.
  • The engine takes time remaing in the clock game as input and in some cases plays faster or slower.
  • Chess engines include pseudo-randomness to avoid repetition in some scenarios. For example, this snippet is part of Stockfish:
// Add a small random component to draw evaluations to avoid 3fold-blindness
Value value_draw(Depth depth, Thread* thisThread) {
  return depth < 4 * ONE_PLY ? VALUE_DRAW
                              : VALUE_DRAW + Value(2 * (thisThread->nodes & 1) - 1);
}

The other thing I noticed is that white always played “e2e4” as first move, so in order to make the simulation more interesting I decided to force the engine to use different first moves:

starting_moves = [["e2e4", "e7e5"],
                  ["e2e4", "c7c5"],
                  ["d2d4", "g8f6"],
                  ["d2d4", "d7d5"],
                  ["g1f3"],
                  ["c2c4"],
                  ["f2f3"]]

These moves have different levels of strength, being f2f3 one of the worst possible moves in chess.

Opening books

An opening in chess is a sequence of moves that happen when the game starts. Some openings have been studied for centuries and there is very rich knowledge of which moves are better and why. Just like we did with the endgame files, we can also use opening books with Stockfish. The ones I used can be found here.

The following method receives the moves we defined ealier and if use_opening_book is set to True, it will traverse the opening book recommendations choosing always the first suggestion (which is the strongest).

There is no clear line between the opening and middlegame, so I decided to setup an arbitrary limit of 22 moves (11 moves for each player).

def setup_board(starting_moves, use_opening_book):
    board = chess.Board()
    opening_moves = []

    for fm in starting_moves:
        m = chess.Move.from_uci(fm)
        board.push(m)
        opening_moves.append(m)

    book_depth = 22

    if use_opening_book:
        with chess.polyglot.open_reader("D:/Users/frozen/Documents/99_temp/chess/books/elo-2700.bin") as reader:
            while True:
                found = False

                # uses the first play from the recommended plays according to the opening book
                for entry in reader.find_all(board):
                    board.push(entry.move)
                    opening_moves.append(entry.move)
                    found = True
                    break

                if not found or len(opening_moves) >= book_depth:
                    break

    return board, opening_moves

Complete simulation

After playing 100 matches with each opening and depth=20, I had the following results. By the way, 20 is a little bit shallow, but unfortunately setting up a higher value made the simulation too slow on my machine. In any case, it’s what chess.com uses by default, so it’s definitely not bad.

Final results

Based on this small (and statiscally inconclusive) sample size, there are some interesing things to notice:

  • As expected, white won more games and the difference in percentage is similar to the table presented at the start of this post.
  • However, there are much more draws. Excluding the f2f3 case, on average 87.75% of the games ended in draw.
  • In almost all cases, using an opening book favored white. For example, for “d2d4 g8f6” white won 13% more games.
  • As mentioned before, f2f3 is a very bad move and as you can see, white got punished for playing it.

Stockfish test suite

While I was doing research for this post, I found the results of the Stockfish test suite.

In most cases, the tests setup a board position and test for a specific behaviour, so we can’t use these results for comparison purposes, but I was quite surprised by the testing infrastructure.

Stockfish Testing Queue
184 machines 1667 cores 1.10M nps (1831.65M total nps) 1305 games/minute

In case you want to volunteer your computer to run tests, here are the instructions.

Final thoughts

This little experiment made me remember when I used to play chess as a child and in general, automating Stockfish is really easy with Python. Depending on what you want to do, the main problem might be time and hardware, because using deeper analysis is both time consuming and very CPU intensive.

Finally, the complete source code is available.

Thanks for reading!!! 😃

 


PS: Just for completeness, these are the moves the opening book suggested:

  • e2e4 e7e5 g1f3 b8c6 f1b5 a7a6 b5a4 g8f6 e1g1 f8e7 f1e1 b7b5 a4b3 e8g8 h2h3 c8b7 d2d3 d7d6 a2a3 c6a5
  • e2e4 c7c5 g1f3 d7d6 d2d4 c5d4 f3d4 g8f6 b1c3 a7a6 c1e3 e7e5 d4b3 c8e6 f2f3 b8d7 g2g4 d7b6 g4g5 f6h5
  • d2d4 g8f6 c2c4 e7e6 g1f3 d7d5 b1c3 f8e7 c1f4 e8g8 e2e3 b8d7 c4c5 f6h5 f1d3 h5f4 e3f4 b7b6 b2b4 a7a5
  • d2d4 d7d5 c2c4 c7c6 g1f3 g8f6 b1c3 e7e6 c1g5 h7h6 g5f6 d8f6 e2e3 b8d7 f1d3 d5c4 d3c4 g7g6 e1g1 f8g7
  • g1f3 g8f6 c2c4 e7e6 b1c3 d7d5 d2d4 f8e7 c1f4 e8g8 e2e3 b8d7 c4c5 f6h5 f1d3 h5f4 e3f4 b7b6 b2b4 a7a5
  • c2c4 g8f6 b1c3 e7e5 g1f3 b8c6 g2g3 d7d5 c4d5 f6d5 f1g2 d5b6 e1g1 f8e7 a2a3 e8g8 b2b4 c8e6 a1b1 f7f6