We built a modified hybrid of AlphaGo and AlphaGo Zero for 9x9 Go using only computational libraries
Below is a high-level overview of the main components of this project and the relevant files.
Note: Most code files provide example usage and all code files are well-documented, so implementation details will not be re-explained in this README.
Core Go Library {board.py
, group.py
, game_node.py
}
This contains all the logic for the game of Go following the Tromp-Taylor rules for computer Go. The implementation relies heavily on NumPy to store and process board states and its derivative objects efficiently.
Data Processing & Compilation {data_preprocess.py
, dataset.py
, imported_game.py
}
The preprocessing component converts game states into PyTorch Tensors with a more meaningful structure extremely similar to that used in AlphaGo Zero. The dataset (compilation) component collects data from self-play games and from online games in SGF format. These datasets are also easily converted to PyTorch DataLoaders.
Neural Network {network.py
}
This contains the Neural Net's structure, which is extremely similar to that used in AlphaGo Zero. This also contains useful utilities like loading, saving, and loss.
Tree {game_node.py
, tree_node.py
, monte_carlo.py
}
This contains the tree structure between nodes and key Monte-Carlo Tree Search (MCTS) operations (select, expand, evaluate, backpropagate). Note that for this project, the practice of using rollouts to estimate the value at a state is replaced by the neural network's value estimate. Given a large number of searches, the tree naturally builds progressively longer rollouts regardless.
Pre-training {bot.py
}
This provides a unified interface for interacting with different types of bots (random, neural network only, neural network + tree)
Training {supervised_learning.py
, reinforcement_learning.py
, train_alphago.py
, mp_self_play.py
, mp_train_alphago.py
}
This contains the training scripts for supervised and reinforcement learning. Files with the mp
prefix use multiprocessing for self-play to improve performance during the dataset (re)-generation component of reinforcement learning.
Evaluation {elo_calculator.py
, elo_graph.py
}
This generates and visualizes performance data for different configurations of bots.
Visualization {web_vis.py
}
This visualizes the policy, value, and tree states in an interactive web interface.
Week | Links |
---|---|
1 | Slides, Play Online Go, Computer Go Rules |
2 | Slides, NN Arch, PyTorch CNN example |
3 | Slides, PyTorch Modules |
4 | Slides, SL Description |
- | Spring Break |
5 | Slides |
6 | Slides, AGZ MCTS Algorithm |
7 | Slides, AGZ MCTS Algorithm |
8 | Slides |
9 | Slides, AGZ Self-Play |
10 | Slides, AGZ Self-Play |
For a more detailed list of topics and resources, check the most recent "This Week in Mini-AlphaGo" email (released every Wednesday afternoon).
Leads
Jeffrey Lu - lujeff [at] umich [dot] edu
Onat Ozer - ozeronat [at] umich [dot] edu
Members
Ali Boussi, Adarsh Bharathwaj, Xin Ying Chew, Gabriel Koo, Layne Malek, Frank Sun, Selina Sun, Adam Wood, Max Wang
Code Contributions
Jeffrey wrote the core Go library, parallelized training scripts, web visualizer, and SGF parser.
Onat wrote the Elo rating evaluation system and standard training script.
All remaining components (preprocessing, MCTS, training pipelines, etc.) were written by members.
Content Contributions
A large majority of technical visuals in slides were designed from scratch by the leads.
Compute
This project would not have been possible without the compute resources provided MIDAS, U-M ARC High Performance Computing, and Google Cloud