Welcome to the Tonic RL library!
Please take a look at the paper for details and results.
The main design principles are:
-
Modularity: Building blocks for creating RL agents, such as models, replays, or exploration strategies, are implemented as configurable modules.
-
Readability: Agents are written in a simple way with an identical API and logs are nicely displayed on the terminal with a progress bar.
-
Fair comparison: The training pipeline is unique and compatible with all Tonic agents and environments. Agents are defined by their core ideas while general tricks/improvements like non-terminal timeouts, observation normalization and action scaling are shared.
-
Benchmarking: Benchmark data of the provided agents trained on 70 continuous control environments are provided for direct comparison.
-
Wrapped popular environments: Environments from OpenAI Gym, PyBullet and DeepMind Control Suite are made compatible with non-terminal timeouts and synchronous distributed training.
-
Compatibility with different ML frameworks: Both TensorFlow 2 and PyTorch are currently supported. Simply import
tonic.tensorflow
ortonic.torch
. -
Experimenting from the console: While launch scripts can be used, iterating over various configurations from a console is made possible using snippets of Python code directly.
-
Visualization of trained agents: Experiment configurations and checkpoints can be loaded to play with trained agents.
-
Collection of trained models: To keep the main Tonic repository light, the full logs and trained models from the benchmark are available in the tonic_data repository.
Download and install Tonic:
git clone https://github.com/fabiopardo/tonic.git
pip install -e tonic/
Install TensorFlow or PyTorch, for example using:
pip install tensorflow torch
Use TensorFlow or PyTorch to train an agent, for example using:
python3 -m tonic.train \
--header 'import tonic.torch' \
--agent 'tonic.torch.agents.PPO()' \
--environment 'tonic.environments.Gym("BipedalWalker-v3")' \
--name PPO-X \
--seed 0
Snippets of Python code are used to directly configure the experiment. This is a very powerful feature allowing to configure agents and environments with various arguments or even load custom modules without adding them to the library. For example:
python3 -m tonic.train \
--header "import sys; sys.path.append('path/to/custom'); from custom import CustomAgent" \
--agent "CustomAgent()" \
--environment "tonic.environments.Bullet('AntBulletEnv-v0')" \
--seed 0
By default, environments use non-terminal timeouts, which is particularly important for locomotion tasks. But a time feature can be added to the observations to keep the MDP Markovian. See the Time Limits in RL paper for more details. For example:
python3 -m tonic.train \ ⏎
--header 'import tonic.tensorflow' \
--agent 'tonic.tensorflow.agents.PPO()' \
--environment 'tonic.environments.Gym("Reacher-v2", terminal_timeouts=True, time_feature=True)' \
--seed 0
Distributed training can be used to accelerate learning. In Tonic, groups of sequential workers can be launched in parallel processes using for example:
python3 -m tonic.train \
--header "import tonic.tensorflow" \
--agent "tonic.tensorflow.agents.PPO()" \
--environment "tonic.environments.Gym('HalfCheetah-v3')" \
--parallel 10 --sequential 100 \
--seed 0
During training, the experiment configuration, logs and checkpoints are
saved in environment/agent/seed/
.
Result can be plotted with:
python3 -m tonic.plot --path BipedalWalker-v3/ --baselines all
Regular expressions like BipedalWalker-v3/PPO-X/0
,
BipedalWalker-v3/{PPO*,DDPG*}
or *Bullet*
can be used to point to different
sets of logs.
The --baselines
argument can be used to load logs from the benchmark. For
example --baselines all
uses all agents while --baselines A2C PPO TRPO
will
use logs from A2C, PPO and TRPO.
Different headers can be used for the x and y axes, for example to compare the
gain in wall clock time of using distributed training, replace --parallel 10
with --parallel 5
in the last training example and plot the result with:
python3 -m tonic.plot --path HalfCheetah-v3/ --x_axis train/seconds --x_label Seconds
After some training time, checkpoints are generated and can be used to play with the trained agent:
python3 -m tonic.play --path BipedalWalker-v3/PPO-X/0
Environments are rendered using the appropriate framework. For example, when
playing with DeepMind Control Suite environments, policies are loaded in a
dm_control.viewer
where Space
is used to start the interaction, Backspace
is used to start a new episode, [
and ]
are used to switch cameras and
double click on a body part followed by Ctrl + mouse clicks
is used to add
perturbations.
The tonic_data
repository can be downloaded with:
git clone https://github.com/fabiopardo/tonic_data.git
The best seed for each agent is stored in environment/agent
and can be
reloaded using for example:
python3 -m tonic.play --path tonic_data/tensorflow/humanoid-stand/TD3
The full benchmark plots are available here.
They can be generated with:
python3 -m tonic.plot \
--baselines all \
--backend agg --columns 7 --font_size 17 --legend_font_size 30 --legend_marker_size 20 \
--name benchmark
Or:
python3 -m tonic.plot \
--path tonic_data/tensorflow \
--backend agg --columns 7 --font_size 17 --legend_font_size 30 --legend_marker_size 20 \
--name benchmark
And a selection can be generated with:
python3 -m tonic.plot \
--path tonic_data/tensorflow/{AntBulletEnv-v0,BipedalWalker-v3,finger-turn_hard,fish-swim,HalfCheetah-v3,HopperBulletEnv-v0,Humanoid-v3,quadruped-walk,swimmer-swimmer15,Walker2d-v3} \
--backend agg --columns 5 --font_size 20 --legend_font_size 30 --legend_marker_size 20 \
--name selection
Tonic was inspired by a number of other deep RL code bases. In particular, OpenAI Baselines, Spinning Up in Deep RL and Acme.
If you use Tonic in your research, please cite the paper:
@article{pardo2020tonic,
title={Tonic: A Deep Reinforcement Learning Library for Fast Prototyping and Benchmarking},
author={Pardo, Fabio},
journal={arXiv preprint arXiv:2011.07537},
year={2020}
}