Deep Reinforcement Learning Algorithms

Disclaimer: Udacity provided some starter code, but the implementation for these concepts are done by myself. Please contact [email protected] for any questions.

Note: Please refer to the instructions on how to download the dependencies for these projects here.

Certificate of Completion

https://confirm.udacity.com/XLGDCKNX

Project Reports

Summary

Deep reinforcement learning (deep RL) is a subfield of machine learning that combines reinforcement learning (RL) and deep learning. RL considers the problem of a computational agent learning to make decisions by trial and error. Deep RL incorporates deep learning into the solution, allowing agents to make decisions from unstructured input data without manual engineering of the state space. Deep RL algorithms are able to take in very large inputs (e.g. every pixel rendered to the screen in a video game) and decide what actions to perform to optimize an objective (eg. maximizing the game score). Deep reinforcement learning has been used for a diverse set of applications including but not limited to robotics, video games, natural language processing, computer vision, education, transportation, finance and healthcare [1].

The following projects focuses on model free reinforcement learning, where the agent has no concept of how its current action will affect its next state. In more technical terms, this family of algorithms do not use the Transition Probability Distribution associated with the Markov Decision Process.

Introduction to Reinforcement Learning

In this segment, I was introduced to concepts such as Markov Decision Process, Monte Carlo Methods, Temporal-Difference Methods and RL in Continous Spaces. I had the chance to solve some openAI Gym environments, such as Cliff Walking Env and Taxi-v2 Env.

Deep Q-Learning for Robot Navigation

This project focuses on the use of Deep Q-Learning (DQN) to train an agent to collect yellow bananas while avoiding the purple ones. Here are more information on the training algorithm and project instructions.

Deep Deterministic Policy Gradient for Robot Arm Continuous Control

This project focuses on the use of Deep Deterministic Policy Gradient (DDPG) to train a robotic arm to keep its endeffector on a moving target position. Here are more information on the training algorithm and project instructions.

In addition, the Distributed Distributional Deterministic Policy Gradients (D4PG) method was introduced into a multi-arm simulation environment. D4PG utilizes distributional value estimation, n-step returns, prioritized experience replay (PER), distributed K-actor exploration for fast and stable learning. Implementation of PER is omitted as the original paper suggests its lack of efficacy in training speed or stability.

Finally, the Proximal Policy Optimization (PPO) method was introduced into a multi-crawler simulation environment. This implementation of PPO combines clipped surrogate actor loss, critic loss and entropy loss. Generalized Advantage Estimate (GAE) was introduced to stabalize training by balancing bias and variance. There are 12 agents that are learning concurrently in the same environment, where experiences and weights are shared to achieve stable training.

Multi-Agent Deep Deterministic Policy Gradient for Cooperative Tennis

This project focuses on the use of Multi-Agent Deep Deterministic Policy Gradient (MADDPG) to train 2 tennis bats to cooperate with each other in keeping the ball midair for as long as possible. Here are more information on the training algorithm and project instructions.

Additionally, the Proximal Policy Optimization (PPO) method was introduced into a soccer simulation environment, with each team comprising of a striker and a goalie. This implementation of PPO combines clipped surrogate actor loss, critic loss and entropy loss. Generalized Advantage Estimate (GAE) was introduced to stabalize training by balancing bias and variance. By the end of training, both of the agents specialized in their individual roles, leading to 100% wins against opponents taking random actions.

Name		Name	Last commit message	Last commit date
Latest commit History 179 Commits
cheatsheet		cheatsheet
exercises		exercises
p1_navigation		p1_navigation
p2_continuous-control		p2_continuous-control
p3_collab-compet		p3_collab-compet
python		python
.gitignore		.gitignore
DRL_Udacity_Nanodegree_Cert.pdf		DRL_Udacity_Nanodegree_Cert.pdf
INSTRUCTIONS.md		INSTRUCTIONS.md
LICENSE		LICENSE
README.md		README.md
drlnd_env.yml		drlnd_env.yml
drlnd_gpu_env.yml		drlnd_gpu_env.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Deep Reinforcement Learning Algorithms

Certificate of Completion

Project Reports

Summary

Introduction to Reinforcement Learning

Deep Q-Learning for Robot Navigation

Deep Deterministic Policy Gradient for Robot Arm Continuous Control

Multi-Agent Deep Deterministic Policy Gradient for Cooperative Tennis

About

Uh oh!

Releases

Packages

Languages

License

derektan95/Deep-Reinforcement-Learning-Algorithms

Folders and files

Latest commit

History

Repository files navigation

Deep Reinforcement Learning Algorithms

Certificate of Completion

Project Reports

Summary

Introduction to Reinforcement Learning

Deep Q-Learning for Robot Navigation

Deep Deterministic Policy Gradient for Robot Arm Continuous Control

Multi-Agent Deep Deterministic Policy Gradient for Cooperative Tennis

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages