18 December 2017

The Constellation of AlphaZero

Starting with a post to Improve Engine Software (October 2017), I've been running a weekly series 'to investigate what sort of engine setup I would need to improve my result'. Last week I did a series of three consecutive posts on AlphaZero:-

It sounds like AlphaZero might be the answer to my quest. What exactly do I need to install and run it? The 'Giraffe' post included a link to the paper announcing AlphaZero: 'Mastering Chess and Shogi by Self-Play'. Pages two and three of that paper included a three paragraph overview of AlphaZero's nuts-and-bolts. I'm copying it here as an image because it includes math symbols that I don't want to replicate yet.

Two phrases in that passage highlight the base technologies:-

  • 'Deep neural network', and
  • 'Monte-Carlo tree search (MCTS)'

That's a short list of the software underlying AlphaZero, but what about the hardware? Later, the same paper says,

We applied the AlphaZero algorithm to chess, shogi, and also Go. Unless otherwise specified, the same algorithm settings, network architecture, and hyper-parameters were used for all three games. We trained a separate instance of AlphaZero for each game. Training proceeded for 700,000 steps (mini-batches of size 4,096) starting from randomly initialised parameters, using 5,000 first-generation TPUs to generate self-play games and 64 second-generation TPUs to train the neural networks. Further details of the training procedure are provided in the Methods.

I had never before heard of a TPU, an acronym for 'tensor processing unit'. Before I go any further, I'd better learn a little more about these concepts that I've just bulleted. As usual, I'll call on Wikipedia to explain the basics. The introduction to deep neural networks is on a page about Deep learning.

A deep neural network (DNN) is an ANN with multiple hidden layers between the input and output layers. Similar to shallow ANNs, DNNs can model complex non-linear relationships. DNN architectures generate compositional models where the object is expressed as a layered composition of primitives. The extra layers enable composition of features from lower layers, potentially modeling complex data with fewer units than a similarly performing shallow network.

Deep architectures include many variants of a few basic approaches. Each architecture has found success in specific domains. It is not always possible to compare the performance of multiple architectures, unless they have been evaluated on the same data sets. DNNs are typically feedforward networks in which data flows from the input layer to the output layer without looping back.

So a DNN is based on an ANN, but what's an ANN? It is explained in the preceding section of the same page.

Artificial neural networks (ANNs) or connectionist systems are computing systems inspired by the biological neural networks that constitute animal brains. Such systems learn (progressively improve their ability) to do tasks by considering examples, generally without task-specific programming. [...]

An ANN is based on a collection of connected units called artificial neurons, (analogous to axons in a biological brain). Each connection (synapse) between neurons can transmit a signal to another neuron. The receiving (postsynaptic) neuron can process the signal(s) and then signal downstream neurons connected to it. [...]

Typically, neurons are organized in layers. Different layers may perform different kinds of transformations on their inputs. Signals travel from the first (input), to the last (output) layer, possibly after traversing the layers multiple times.

Now here's an explanation of Monte Carlo tree search.

In computer science, Monte Carlo tree search (MCTS) is a heuristic search algorithm for some kinds of decision processes, most notably those employed in game play. [...] The focus of Monte Carlo tree search is on the analysis of the most promising moves, expanding the search tree based on random sampling of the search space. The application of Monte Carlo tree search in games is based on many playouts. In each playout, the game is played out to the very end by selecting moves at random. The final game result of each playout is then used to weight the nodes in the game tree so that better nodes are more likely to be chosen in future playouts

TPUs are explained in a page describing a Tensor processing unit.

A tensor processing unit (TPU) is an application-specific integrated circuit (ASIC) developed by Google specifically for neural network machine learning. Compared to a graphics processing unit, it is designed for a high volume of low precision computation (e.g. as little as 8-bit precision) with higher IOPS per watt, and lacks hardware for rasterisation/texture mapping. The chip has been specifically designed for Google's TensorFlow framework.

While I was poking around in various documents, I kept running into that word, 'TensorFlow'. It's explained in another page, TensorFlow.

TensorFlow is an open-source software library for dataflow programming across a range of tasks. It is a symbolic math library, and also used for machine learning applications such as neural networks. It is used for both research and production at Google, often replacing its closed-source predecessor, DistBelief. TensorFlow was developed by the Google Brain team for internal Google use. It was released under the Apache 2.0 open source license on November 9, 2015.

The Wikipedia page on TPUs also says, 'Google's TPUs are proprietary and are not commercially available'. Looks like I'm not going to be installing and running AlphaZero anytime soon. Basically, it's a neural network that uses a Monte Carlo search to evaluate chess positions. The idea isn't completely new, but Google's DeepMind was the first to make it work.

The excerpt I copied from the paper also talks about parameters. It ends, 'The updated parameters are used in subsequent games of self-play.' I wonder if I can find out more about those parameters. I'd also like to know how all of this was used in the match that crushed Stockfish.

No comments: