Skip to content

Conversation

@araffin
Copy link
Owner

@araffin araffin commented Jul 25, 2025

Description

See Stable-Baselines-Team/stable-baselines3-contrib#297

Motivation and Context

  • I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (update in the documentation)

Checklist:

  • I've read the CONTRIBUTION guide (required)
  • I have updated the changelog accordingly (required).
  • My change requires a change to the documentation.
  • I have updated the tests accordingly (required for a bug fix or a new feature).
  • I have updated the documentation accordingly.
  • I have reformatted the code using make format (required)
  • I have checked the codestyle using make check-codestyle and make lint (required)
  • I have ensured make pytest and make type both pass. (required)
  • I have checked that the documentation builds using make doc (required)

Note: You can run most of the checks using make commit-checks.

Note: we are using a maximum length of 127 characters per line

@araffin araffin requested a review from Copilot July 25, 2025 08:53
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds n-step return support to the stable-baselines3-contrib reinforcement learning library by introducing an n_steps parameter across various algorithms. The main goal is to enable multi-step temporal difference learning, which can improve sample efficiency and learning stability.

  • Adds n_steps parameter to all off-policy algorithms (SAC, TD3, TQC, CrossQ, DQN, DDPG)
  • Updates training logic to handle discount factors from n-step returns instead of fixed gamma values
  • Refactors PPO logging to include separate policy and entropy loss tracking

Reviewed Changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated no comments.

Show a summary per file
File Description
sbx/common/off_policy_algorithm.py Adds n_steps parameter and NStepReplayBuffer integration
sbx/common/type_aliases.py Extends ReplayBufferSamplesNp to include discounts field
sbx/sac/sac.py Updates SAC algorithm to support n-step returns with discount handling
sbx/td3/td3.py Updates TD3 algorithm to support n-step returns with discount handling
sbx/tqc/tqc.py Updates TQC algorithm to support n-step returns with discount handling
sbx/crossq/crossq.py Updates CrossQ algorithm to support n-step returns with discount handling
sbx/dqn/dqn.py Updates DQN algorithm to support n-step returns with discount handling
sbx/ddpg/ddpg.py Updates DDPG algorithm to support n-step returns
sbx/ppo/ppo.py Refactors logging to separate policy and entropy losses
sbx/ppo/policies.py Updates import statements and type annotations
setup.py Updates dependency versions for stable_baselines3, jax, and black
sbx/version.txt Bumps version from 0.21.0 to 0.22.0
tests/test_run.py Adds n_steps parameter to test configuration
Comments suppressed due to low confidence (3)

setup.py:45

  • The JAX version constraint <0.7.0 may be incorrect. JAX has not released version 0.7.0 as of January 2025. The latest stable JAX versions are in the 0.4.x series. Consider using a more realistic upper bound like <0.5.0 or removing the upper bound entirely.
        "jax>=0.4.24,<0.7.0",  # tf probability not compatible yet with latest jax version

setup.py:65

  • Black version 25.1.0 does not exist. As of January 2025, Black's latest versions are in the 24.x series. Consider using a realistic version constraint like "black>=24.2.0,<25".
            "black>=25.1.0,<26",

sbx/ppo/ppo.py:183

  • The removal of this line creates an unused variable ent_key that was being generated but is now missing. This could cause a NameError if ent_key is used elsewhere in the code, or it might indicate that some entropy-related functionality has been inadvertently removed.
            )

@araffin araffin merged commit 1e5e433 into master Jul 25, 2025
4 checks passed
@araffin araffin deleted the feat/n-steps branch July 25, 2025 09:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants