

In October 2015, AlphaGo played its first match against the reigning three-time European Champion, Mr Fan Hui. AlphaGo went on to defeat Go world champions in different global arenas and arguably became the greatest Go player of all time. This process is known as reinforcement learning. Over time, AlphaGo improved and became increasingly stronger and better at learning and decision-making.

Then we had it play against different versions of itself thousands of times, each time learning from its mistakes. We introduced AlphaGo to numerous amateur games to help it develop an understanding of reasonable human play. The other neural network, the “value network”, predicts the winner of the game. One neural network, the “policy network”, selects the next move to play. These neural networks take a description of the Go board as an input and process it through a number of different network layers containing millions of neuron-like connections. We created AlphaGo, a computer program that combines advanced search tree with deep neural networks.

To capture the intuitive aspect of the game, we needed a new approach.
