This project is meant to provide a quick and easy template to implement or develop a wide variety of RL algorithms. When starting a new RL project, I would always find myself rewriting the same code and having to spend unecessary time debugging code I had already written dozens of times before. This project provides a simple and adjustable implementation of policy gradient in both 1D and 2D environments. Environments are simulated over multiple processes using mpi4py making the process very efficient.
First you need to install a working version of MPI. Many different versions exist, but I personally recommend OpenMPI, Instructions can be found here.
The template has only been tested with Python3.6, so versions 3.6 and above should all work. Other versions may also work, but are not guaranteed.
Clone the repository then run a pip install on the requirements.txt
file to get the necessary modules installed.
git clone https://github.com/ejmejm/RL-template.git
cd ./RL-template
pip install -r requirements.txt
* It is recommended that you use Tensorflow with GPU support for faster training times, but the demos can also be run on just a CPUs.
To get started, two demos are provided using OpenAI gym's CartPole (1D) and Breakout (2D) environments.
To run the CartPole demo on 4 cores run mpiexec -n 4 python3 run_cart_pole.py
To run the Breakout demo on 4 cores run mpiexec -n 4 python3 run_breakout.py
By changing the -n
flag, you can adjust the number of cores used to run the program.
The implementation is meant to be simple and easily understandable while still being maximally efficient. Simulation of environments is done over multiple processes in parallel using mpi4py. To achieve this, the main network is copied to all processes, and weights are synced between all networks after each training epoch. Below is a list of individual files and corresponding functions.
-
models.py
- Implements aBaseModel
class that can used as a basis to implement most policy value based RL algorithms and sync weights with MPI. Implements aOneDimModel
that extendsBaseModel
and implements vanilla policy gradient in a 1D environment with a discrete action-space. Implements aTwoDimModel
that extendsBaseModel
and implements vanilla policy gradient in a 2D environment with a discrete action-space. -
utils.py
- Defines utility functions for logging, filtering 2D observations, and calculating GAEs + dicounted rewards. -
run_cart_pole.py
andrun_breakout.py
- Both implement worker functions that define how an environment is simulated. They then run a main loop that consists of running environments in parallel, gathering + formatting the data, using the data to train the main network, and then propagating the weights from the main network to all copied networks.
- Edan Meyer - GitHub Profile
This project is licensed under the MIT License - see the LICENSE.md file for details