PopArt extension to TorchBeast, the PyTorch implementation of IMPALA.
The PopArt extension was used to train a multi-task agent for six Atari games (AirRaid, Carnival, DemonAttack, Pong, SpaceInvaders, all with the NoFrameskip-v4 variant) and compared to the corresponding single-task agents and to a simpler mulit-task agent without PopArt normalisation. More details on these experiments can be found in the report.
The different games plans learned by these three models, can be illustrated with the help of saliency maps (here red is the policy saliency and green is the baseline saliency). More details on these experiments can be found in the report.
The following trained models can be downloaded from the models directory:
Name | Environments (NoFrameskip-v4) | Steps (millions) |
---|---|---|
AirRaid | AirRaid | 50 |
Carnival | Carnival | 50 |
DemonAttack | DemonAttack | 50 |
NameThisGame | NameThisGame | 50 |
Pong | Pong | 50 |
SpaceInvaders | SpaceInvaders | 50 |
MultiTask | AirRaid, Carnival, DemonAttack, NameThisGame, Pong, SpaceInvaders | 300 |
MultiTaskPopArt | AirRaid, Carnival, DemonAttack, NameThisGame, Pong, SpaceInvaders | 300 |
For our experiments we used the faster PolyBeast implementation of TorchBeast and refer the reader to the installation instructions in the original repository. However, since we have encountered problems getting this version to work, we also added multi-task training functionality and PopArt to the MonoBeast implementation of TorchBeast. However, some of the testing functionality is not implemented for this version, but PolyBeast can be used for this if the imports for nest
and libtorchbeast
are commented out.
Since it is more convenient to get PolyBeast to run, these are the platforms on which we managed to install and use it:
- Ubuntu 18.04
- MacOS (CPU only)
- Google Cloud Platform (Standard machine with NVIDIA Tesla P100 GPUs)
python -m torchbeast.polybeast --mode train --xpid MultiTaskPopArt --env AirRaidNoFrameskip-v4,CarnivalNoFrameskip-v4,DemonAttackNoFrameskip-v4,NameThisGameNoFrameskip-v4,PongNoFrameskip-v4,SpaceInvadersNoFrameskip-v4 --total_steps 300000000 --use_popart
There are the following additional flags, as compared to the original TorchBeast implementation:
use_popart
, to enable to PopArt extensionsave_model_every_nsteps
, to save intermediate models during training
python -m torchbeast.monobeast --mode train --xpid MultiTaskPopArt --env AirRaidNoFrameskip-v4,CarnivalNoFrameskip-v4,DemonAttackNoFrameskip-v4,NameThisGameNoFrameskip-v4,PongNoFrameskip-v4,SpaceInvadersNoFrameskip-v4 --total_steps 300000000 --use_popart
In addition MonoBeast can also be used to run two other models: a small CNN (optionally with an LSTM) and an Attention-Augmented Agent (models selected with the flag agent_type
). Unfortunately we did not get this model to train properly, but for the sake of completeness and possible future reference, here are the additional flags that can be used with this model:
frame_height
andframe_width
, which set the dimensions to which frames are rescaled (in the original paper the original size is used as opposed to the rescaling done in TorchBeast)aaa_input_format
(with choicesgray_stack
,rgb_last
,rgb_stack
), which decides how frames are formatted as input for the network (wherergb_last
only feeds one of every four frames in RGB, as is done in the original paper)
python -m torchbeast.polybeast --mode test --xpid MultiTaskPopArt --env PongNoFrameskip-v4 --savedir=./models
python -m torchbeast.polybeast --mode test_render --xpid MultiTaskPopArt --env PongNoFrameskip-v4 --savedir=./models
python -m torchbeast.saliency --xpid MultiTaskPopArt --env PongNoFrameskip-v4 --first_frame 0 --num_frames 100 --savedir=./models
Note that compared to the original saliency code, the extension does not produce a movie directly, but saves the frames as individual images. Animated gifs can subsequently be produced with a Jupyter notebook.
NOTE: it is assumed that a) intermediate model checkpoints have been saved (flag save_model_every_nsteps
) and b) the results for all models are saved in the same parent directory and have the exact names used in our experiments (see in the table)
python -m torchbeast.analysis.analyze_resnet --model_load_path /path/to/directory --mode filter_comp --comp_num_models 10
The different comparisons presented in the report can be set with the flag comp_between
. By default the only comparisons done are between the vanilla multi-task model and the multi-task PopArt model, as well as between each of these models and all single-task models.
For plotting the following command can be used (saving the figures in the same directory that the data generated by the previous command was loaded from):
python -m torchbeast.analysis.analyze_resnet --load_path /path/to/directory --mode filter_comp_plot --save_figures
For more options to the data generation and plotting, the help texts can be consulted.
TorchBeast
@article{torchbeast2019,
title={{TorchBeast: A PyTorch Platform for Distributed RL}},
author={Heinrich K\"{u}ttler and Nantas Nardelli and Thibaut Lavril and Marco Selvatici and Viswanath Sivakumar and Tim Rockt\"{a}schel and Edward Grefenstette},
year={2019},
journal={arXiv preprint arXiv:1910.03552},
url={https://github.com/facebookresearch/torchbeast},
}
PopArt
@inproceedings{hessel2019,
title={Multi-task deep reinforcement learning with popart},
author={Hessel, Matteo and Soyer, Hubert and Espeholt, Lasse and Czarnecki, Wojciech and Schmitt, Simon and van Hasselt, Hado},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={33},
pages={3796--3803},
year={2019}
}
Saliency
@article{greydanus2017visualizing,
title={Visualizing and Understanding Atari Agents},
author={Greydanus, Sam and Koul, Anurag and Dodge, Jonathan and Fern, Alan},
journal={arXiv preprint arXiv:1711.00138},
year={2017},
url={https://github.com/greydanus/visualize_atari},
}