The repository contains scripts for safe reinforcement learning in autonomous driving environment. It includes the implementation of an MDP with an autonomous driving environment, modifiable reward function specification and the possibility of empirical model learning.
The module contains implementations of two initialization algorithms:
- Behavioral Cloning (BC)
- Generative Adversarial Imitation Learning (GAIL) [1]
The module contains implementations of two reinforcement learning algorithms:
- Q-learning with risk-directed exploration [2]
- Policy Gradient with variance constraint [3]
The code is compatible with python 3.6. Install the requirements:
pip install -r requirements.txt
Policy initialization algorithms need expert demonstrations to run. The repository consists of 350 trajectories sampled manually. They are stored in the folder policy_initialization/trajectories
.
To run BC script from the folder policy_initialization/
:
python BC_agent.py --tp trajectories
To run GAIL script from the folder policy_initialization/
:
python GAIL_agent.py --tp trajectories
Scripts output performance metrics, such as graphs and/or text.
To run Q-learning in model-free mode* from the folder risk_aware_rl/
:
python q_learning_agent.py
.
To run Policy Gradient in risk-neutral mode* from the folder risk_aware_rl/
:
python policy_gradient_agent.py
Implementations are based on:
[1] Ho, Jonathan and Ermon, Stefano. “Generative Adversarial Imitation Learning”. In: (June 2016). eprint: 1606.03476. url: https://arxiv.org/pdf/1606.03476.
[2] L.M. Law, Edith. “Risk-directed Exploration in Reinforcement Learn- ing”. MA thesis. Montreal, Quebec: McGill University, Feb. 2005.
[3] Castro, Dotan Di, Tamar, Aviv, and Mannor, Shie. “Policy Gradients with Variance Related Risk Criteria”. In: (June 2012). eprint: 1206. 6404. url: https://arxiv.org/pdf/1206.6404.
* for more modes and parameters access script's help:
python <script_name> -h