DEEPMIND, the British artificial intelligence company, has released over 200 games of AlphaZero, the awe-inspiring chess-playing entity that taught itself to play using deep reinforcement learning algorithms.
The results show that AlphaZero is much stronger than conventional chess engines, which are programmed to evaluate positions according to criteria honed by their programmers. In contrast, AlphaZero was given only the basic rules of the game and improved by playing 44 million games against itself in just nine hours and updating its neural networks with the knowledge learned from experience. AlphaZero decisively defeated the best software program, Stockfish. In the games played from the starting position where AlphaZero could select its openings, AlphaZero won +35 = 72 -3. From starting positions used in computer chess competition, the score was AlphaZero +17 = 75 -8.
It is a delight to see how aggressive – swashbuckling, one might even say – AlphaZero’s playing style is. AlphaZero seems to value mobility and activity over everything else and pays far less regard to material gain than conventional programs. In that sense, it seems far more human, which I find rather heartening. As Garry Kasparov commented: “Because AlphaZero programs itself, I would say that its style reflects the truth.”
Back in June, DeepMind co-founder Demis Hassabis, himself a former chess prodigy, allowed Dominic Lawson, Chris Flowers and myself to play AlphaZero as long as the games were kept under wraps until the recently published peer-reviewed scientific paper on AlphaZero appeared in Science. GM David Howell joined as our adviser.
We played two games and were incredibly lucky to draw the first. AlphaZero played the Berlin Defence – we went for the dull symmetrical variation and the first 17 moves followed a game of David’s against Dmitry Andreikin that we had been analysing previously. In the return we were not so lucky: