In this project, the agent is a double-jointed robotic arm that must learn how to reach the goal position represented by a green sphere. To achieve this goal, the method used was Deep Deterministic Policy Gradient, implemented with Prioritized Experience Replay.
The agent receives a reward for every time step that it successfully reaches the goal location.
An observation of the environment is composed by a vector with 33 elements, which represents each joint's position, rotation, linear and angular velocity. To interact with the environment, the agent is capable of applying torque to each of its joints. The intensity of the applied torque must be between -1 and 1.
This is an episodic task with a continuous action space. An episode ends after reaching 1002 time steps. This could be easily transformed into a continuous task by simply ignoring the terminal states and instead collected a number of time steps representing each episode.
This project is considered solved if the agent achieves an average reward of 30.0 for the next 100 episodes. In this implementation, the best result obtained was an average reward of 37.36 after 412 episodes.
This project is a requirement from the Udacity Deep Reinforcement Learning Nanodegree . The environment is provided by Udacity. It depends on the following packages:
- Python 3.6
- Numpy
- PyTorch
- Unity ML-Agents Beta v0.4
- Install python3.6 (any version above is not compatible with the unity ml-agents version needed for this environment)
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt update
sudo apt install python3.6-full
- (Optional) Create a virtual environment for this project
cd <parent folder of venv>
python3.6 -m venv <name of the env>
source <path to venv>/bin/activate
- Install the python dependencies
python3 -m pip install numpy torch
- Download the Unity ML-Agents release file for version Beta v0.4. Then, unzip it at folder of your choosing
- Build Unity ML-Agents
cd <path ml-agents>/python
python3 -m pip install .
- Clone this repository and download the environment created by Udacity and unzip it at the world folder
git clone https://github.com/jhonasiv/reacher-udacity
cd reacher-udacity
mkdir world
wget https://s3-us-west-1.amazonaws.com/udacity-drlnd/P2/Reacher/one_agent/Reacher_Linux.zip
unzip Reacher_Linux.zip -d world
After training the agent until a rolling average reward of 35.0 was reached for 100 episodes, this is how it looks.
Agent trained with an average score of 35.0
- Execute the main.py file
python3 src/main.py
- For more information on the available command line arguments, use:
python3 src/main.py --help
- Some notable cli arguments:
--eval
: runs the application in evaluation mode, skipping training step,model_path
must be set--buffer_size
: maximum size of the experience buffer.--a_lr
: learning rate for the actor--c_lr
: learning rate for the critic
- Some notable cli arguments: