GitHub - shaikh58/quad-nav-rl: Reinforcement learning approach to safe 3D navigation of a quadcopter

Reinforcement learning approach to safe 3D autonomous navigation of a quadrotor

This is a mini research project to learn, apply, and demonstrate the use of reinforcement learning to create a controller for a simulated quadrotor to navigate in a 3D environment. This project uses the Mujoco physics engine to simulate the quadrotor, and uses Proximal Policy Optimization (PPO), based on the cleanRL implementation.

Before we get into the details, let's take a look at some of the results. Currently, model is trained on a fixed goal start and goal location, and the quadrotor is able to navigate to the goal, and hover over it.

Key features:

Custom Gymnasium environment based on the Mujoco Skydio X2 model.
Parallel environments for higher training throughput
Train using PPO
Obstacle avoidance based on optimization approach (in progress)
Formulation and implementation of Euler-Lagrange quadrotor dynamics in canonical form x_dot = f(x) + g(x)u
Dynamic environment generation during training episode rollouts using MjSpec

Details

Formulation

The navigation problem is formulated as a finite horizon (episodic), continuous time MDP
The quadrotor state is defined as x = [p, q, v, w] where p is the position, v is the velocity, q is the quaternion, and w is the angular velocity.

Custom Gymnasium environment

Reward shaping with wandb hyperparameter sweeps to determine the optimal reward function
Termination conditions tuned to support efficient data collection
Supports start/goal location randomization during training to improve generalization; this requires goal conditioned training, which is not implemented yet.
Dynamic generation of obstacles using the MjSpec API. Can generate new obstacle configurations for each episode during training.
Reset noise to improve generalization
Wrapped the Gymnasium observation normalization wrapper to save normalization statistics learned during training. During evaluation on a new environment, it is important to load these normalization statistics and not calculate new ones.

Obstacle Avoidance (in progress)

Reinforcement learning algorithms estimate the solution to an unconstrained optimization problem, and as such, do not provide mathematical guarantees for safety
I've used a Control Barrier Function (CBF) formulation based on this paper to ensure safety. Determining the control input that satisfies the CBF constraints requires solving a Quadratic Program (QP) at each time step.
The safety layer acts as a compensator to the potentially unsafe RL control input. As such, it may modify the RL control. Since the policy and value networks are updated using these unsafe actions, the modified controls can make the learning process difficult.
The authors recommend using a QP solver that is differentiable. To that end, I use the qpth differentiable QP solver library to define a QP layer and integrate it with the actor network.
The CBF formulation should make the training process independent of obstacle locations. This is a work in progress.

Quadrotor Dynamics

Quadrotors are underactuated systems with nonlinear dynamics. They have 6 degrees of freedom (3 position, 3 orientation), and only 4 direct control inputs. This makes control design challenging.
I've implemented the dynamics in canonical form in order to compute the CBF constraints. The implementation supports batched inputs for parallel environments.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
demo-videos		demo-videos
envs		envs
models/skydio_x2		models/skydio_x2
notebooks		notebooks
rl		rl
utils		utils
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
run.py		run.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Reinforcement learning approach to safe 3D autonomous navigation of a quadrotor

Key features:

Details

Formulation

Custom Gymnasium environment

Obstacle Avoidance (in progress)

Quadrotor Dynamics

About

Uh oh!

Releases

Packages

Uh oh!

Languages

shaikh58/quad-nav-rl

Folders and files

Latest commit

History

Repository files navigation

Reinforcement learning approach to safe 3D autonomous navigation of a quadrotor

Key features:

Details

Formulation

Custom Gymnasium environment

Obstacle Avoidance (in progress)

Quadrotor Dynamics

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages