Download models | Getting started | What's included | Acknowledgements
A source of 2️⃣5️⃣,5️⃣0️⃣0️⃣ checkpoints (and growing) of curated DQN, M-DQN and C51 agents trained on modern and classic protocols, matching or besting the performance reported in the literature.
We release these models hoping it will help to advance the research in:
- Reproducing DRL results
- Imitation Learning
- Batch/Offline RL
- Multi-task learning
Checkpoints for DQN, M-DQN and C51 agents across two or three training seeds, on modern
or classic
protocols.
An agent trained on 200M frames usually produces 200 checkpoints times the number of training seeds. In order not to make the download size overly large we only include 51 checkpoints per training run. These are sampled geometrically, with denser checkpoints towards the end of the training. This results in the last 20 checkpoints of the full 200 (last 10% of the training run) and then sparser checkpoints towards the beginning of the run, with only 10 out of 51 from the first half. It looks a bit like this:
Note it's not mandatory the best performing checkpoint is included since on some combinations of algorithms and agents the peak performance occurs earlier in training. However this sampling should characterize fairly well the performance of an agent most of the time.
❗✋ If there is demand we can provide the full list of checkpoints for a given agent.
Agents have been trained using PyTorch and the models are stored as compressed state_dict pickle files. Since the networks used on ALE are fairly simple these could easily be converted for use in other deep learning frameworks.
There are two common training and evaluation protocols encountered in the literature. We will call them classic
and modern
across this project:
classic
: it originates from (Mnih, 2015)1 Nature paper and it mostly appears in DeepMind papers.modern
: it originates from (Machado, 2017)2 and a variation of it was adopted by Dopamine3. Since then it started to show more and more often.
The main two differences between the two are the way stochasticity is induced in the environment and how the loss of a life is treated.
We mention again that while we use Dopamine's protocol and sometimes hyperparameters, our agents are trained in PyTorch.
Check the table below for a summary.
Algorithm | Protocol | Games | Seeds | Observations |
---|---|---|---|---|
DQN | modern |
60 | 3 | DQN agent using the settings from dopamine. It's optimised with Adam and uses MSE instead of Huber loss. A surprisingly strong agent on this protocol. |
M-DQN | modern |
60 | 3 | DQN above but using the Munchausen trick4. Even stronger performance. |
C51 | classic |
28/57 | 3 | Closely follows the original paper5. |
DQN Adam | classic |
28/57 | 2 | A DQN agent trained according to the Rainbow paper6. The exact settings and plots can be found in our paper7. |
Right off-the bat you can notice that on the classic
protocol there are only 28 games out of the usual 57. We trained the two agents on this protocol over one year ago using the now deprecated atari-py
project which officially provided the ALE Python bindings in OpenAI's Gym. Unfortunately the package came with a large number of ROMs that are not supported by the current, official, ale-py library. The agents trained on the modern
protocol (as well as the code we provide for visualising agents) all use the new ale-py
. Therefore we decided against providing support for the older library event if it meant dropping half of the trained models. A great resource for reading about this issue is Jesse's Farebrother ALE v0.7 release notes. Importantly, we found out about the issue while checking the performance of the trained models on the new ale-py
back-end and we provide plots showing the remaining 28 agents perform as expected (C51_classic, DQN_classic).
⏬ Download ⏬ the saved models.
Using gsutil
you can download all the models from the command line:
gsutil -m cp -R gs://bitdefender_ml_artifacts/atari ./
or select certain checkpoints like this:
gsutil -m cp -R gs://bitdefender_ml_artifacts/atari/[ALGORITHM]/[GAME]/[SEED]/model_50000000.gz ./
Install the conda
environment using conda env create -f environment.yml
. If this fails for some reason the main requirements are:
pytorch 1.11.0
ale-py 0.7.4
opencv 4.5.2
An easy way to install ale-py
, download and install the ROMs is to just install gym
:
pip install 'gym [atari,accept-rom-license]'
If for some reason the SDL
support is not just right, you might have better luck cloning ALE and installing from source using pip install .
. Just make sure then to use register the ROM files again:
ale-import-roms path/to/roms
See this excellent post about what's new in ALE 0.7
and how to install ROMs.
Just do:
python play.py models/AGENT/GAME/SEED/model_STEP.gz
Passing the -r/--record
flag will create a ./movies
folder and save the screens and audio.
We also support game modes and difficulty levels introduced by Machado, 20172. You can use -v
to activate an interactive mode for selecting game modes and difficulty levels:
python play.py models/AGENT/GAME/SEED/model_STEP.gz -v
There are some conventions encoded in the folder structure used by play.py
to configure the model and the environment using the name of the directory containing the checkpoints. For example DQN_modern
will configure a DQN network and evaluate it on the modern
protocol while C51_classic
will configure a C51-style network and evaluate it on the classic
protocol.
You should end with something like this after downloading all the agents:
.
├── ale_env.py
├── human_play.py
├── play.py
├── README.md
├── models
│ ├── C51_classic
│ └── ...
│ ├── DQN_classic_adam
│ └── ...
│ └── DQN_modern
│ ├── AirRaid
│ │ ├── 0
│ │ ├── 1
│ │ └── 2
│ ...
│ └── Zaxxon
│ ├── 0
│ ├── 1
│ └── 2
Our PyTorch implementation of DQN trained using Adam on the modern protocol compares favourable to the exact same agent trained using Dopamine. The plots below have been generated using the tools provided by rliable.
Some more comparisons can be found here.
A detailed discussion about the performance of DQN + Adam and C51 trained on the classic
protocol can be found in our paper7, where we used these checkpoints as baselines.
- Bitdefender, for providing all the material resources that made possible this project and my colleagues in Bitdefender's Machine Learning & Crypto Research Unit for all their support.
- Kai Arulkumaran, for providing the
atari-py
/ale-py
wrapper I used extensively in my research and who helped me many times figuring out some of the more arcane details of the various training and evaluation protocols in DRL. - Dopamine baselines and configs, which I used extensively for comparing the performance of our implementations and for figuring various hyperparameters.
- Stable Baselines3 Zoo -- agents for seven Atari games.
- Kai Arulkumaran provides a number of ALE checkpoints together with his Rainbow implementation.
- Uber Research Atari Model Zoo -- large number agents trained with Dopamine and OpenAI Baselines. However the availability of these agents is not clear at the moment.
If you use these checkpoints in your research and published work, please consider citing this project:
@misc{gogianu2022agents,
title = {Atari Agents},
author = {Florin Gogianu and Tudor Berariu and Lucian Bușoniu and Elena Burceanu},
year = {2022},
url = {https://github.com/floringogianu/atari-agents},
}