Skip to content

Proposal to Add PPO + Mamba SSM to CleanRL #510

@Tornadosky

Description

@Tornadosky

Proposal to Add PPO + Mamba SSM to CleanRL

Issue Type

  • Bug Report
  • Feature Request / Algorithm Implementation
  • Documentation
  • Question

Overview

This issue proposes adding a new algorithm implementation: PPO integrated with the Mamba State-Space Model (SSM). Mamba is a recent neural network architecture designed for efficient sequence modeling, demonstrating significant improvements in computational speed and memory usage over traditional recurrent models (LSTM, GRU) and Transformer-based models such as the existing ppo_trxl implementation.

Motivation

  • Faster training speeds
  • Lower GPU memory usage compared to Transformer-XL
  • Good performance in POMDP and memory-based tasks

Environments Tested

  • ProofOfMemory Environments
  • MiniGrid (Memory, DoorKey)
  • Classical Control (masked): LunarLander, CartPole
  • MuJoCo continuous control tasks: HalfCheetah, Walker2d, Hopper
  • Atari games: Breakout, Pong

Current Behavior

  • No current implementation of PPO + Mamba SSM in CleanRL

Expected Behavior

  • Implementation of PPO integrated with Mamba SSM
  • Performance benchmarks showing training efficiency and final performance
  • Documentation on usage and hyperparameter tuning

Planned Contribution Steps

  1. Complete comprehensive benchmarks following CleanRL's contribution guidelines
  2. Utilize the RLops utility for rigorous performance and regression analysis
  3. Update the CleanRL documentation to clearly present benchmark results and learning curves
  4. Submit PR with implementation and documentation

I already have experiments in these environments and also the comparison of efficiency metrics. I tested both Mamba and Mamba-2 models.

@vwxyzjn , I plan to follow the way ppo_trxl was integrated. What do you think?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions