This package started as Google Summer of Code 2021 project.
Here's a blogpost summarizing MuZero's summer journey.
This implementation is based on AlphaZero.jl, and is inspired by muzero-general.
To train MuZero on tic tac toe, clone this repo, change branch to MuZero,
git clone https://github.com/michelangelo21/MuZero.git
cd MuZero
git checkout MuZero
and run
julia --project -e 'import Pkg; Pkg.instantiate()'
julia --project ./MuZero/scripts/train_tictactoe.jl
then, to observe results, open tensorboard
in a different terminal:
tensorboard --logdir results
after some time curves should look like this:
This implementation wouldn't exist without Jonathan Laurent (project mentor, creator of AlphaZero.jl) and his valuable insights.