Proposal to Add PPO + Mamba SSM to CleanRL

# Proposal to Add PPO + Mamba SSM to CleanRL

## Issue Type
- [ ] Bug Report
- [x] Feature Request / Algorithm Implementation
- [ ] Documentation
- [ ] Question

## Overview
This issue proposes adding a new algorithm implementation: PPO integrated with the [Mamba State-Space Model (SSM)](https://arxiv.org/abs/2312.00752). Mamba is a recent neural network architecture designed for efficient sequence modeling, demonstrating significant improvements in computational speed and memory usage over traditional recurrent models (LSTM, GRU) and Transformer-based models such as the existing [ppo_trxl](https://docs.cleanrl.dev/rl-algorithms/ppo-trxl/) implementation.

## Motivation
- Faster training speeds
- Lower GPU memory usage compared to Transformer-XL
- Good performance in POMDP and memory-based tasks

## Environments Tested
- ProofOfMemory Environments
- MiniGrid (Memory, DoorKey)
- Classical Control (masked): LunarLander, CartPole
- MuJoCo continuous control tasks: HalfCheetah, Walker2d, Hopper
- Atari games: Breakout, Pong

## Current Behavior
- No current implementation of PPO + Mamba SSM in CleanRL

## Expected Behavior
- Implementation of PPO integrated with Mamba SSM
- Performance benchmarks showing training efficiency and final performance
- Documentation on usage and hyperparameter tuning

## Planned Contribution Steps
1. Complete comprehensive benchmarks following CleanRL's contribution guidelines
2. Utilize the RLops utility for rigorous performance and regression analysis
3. Update the CleanRL documentation to clearly present benchmark results and learning curves
4. Submit PR with implementation and documentation

I already have experiments in these environments and also the comparison of efficiency metrics. I tested both [Mamba](https://arxiv.org/abs/2312.00752) and [Mamba-2](https://arxiv.org/abs/2405.21060) models.

@vwxyzjn , I plan to follow the way [ppo_trxl](https://docs.cleanrl.dev/rl-algorithms/ppo-trxl/) was integrated. What do you think?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Proposal to Add PPO + Mamba SSM to CleanRL #510

Proposal to Add PPO + Mamba SSM to CleanRL

Issue Type

Overview

Motivation

Environments Tested

Current Behavior

Expected Behavior

Planned Contribution Steps

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Proposal to Add PPO + Mamba SSM to CleanRL #510

Description

Proposal to Add PPO + Mamba SSM to CleanRL

Issue Type

Overview

Motivation

Environments Tested

Current Behavior

Expected Behavior

Planned Contribution Steps

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions