Add n-step return support with `n_steps` parameter #74

araffin · 2025-07-25T08:44:29Z

Description

See Stable-Baselines-Team/stable-baselines3-contrib#297

Motivation and Context

I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (update in the documentation)

Checklist:

I've read the CONTRIBUTION guide (required)
I have updated the changelog accordingly (required).
My change requires a change to the documentation.
I have updated the tests accordingly (required for a bug fix or a new feature).
I have updated the documentation accordingly.
I have reformatted the code using make format (required)
I have checked the codestyle using make check-codestyle and make lint (required)
I have ensured make pytest and make type both pass. (required)
I have checked that the documentation builds using make doc (required)

Note: You can run most of the checks using make commit-checks.

Note: we are using a maximum length of 127 characters per line

Copilot

Pull Request Overview

This PR adds n-step return support to the stable-baselines3-contrib reinforcement learning library by introducing an n_steps parameter across various algorithms. The main goal is to enable multi-step temporal difference learning, which can improve sample efficiency and learning stability.

Adds n_steps parameter to all off-policy algorithms (SAC, TD3, TQC, CrossQ, DQN, DDPG)
Updates training logic to handle discount factors from n-step returns instead of fixed gamma values
Refactors PPO logging to include separate policy and entropy loss tracking

Reviewed Changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
sbx/common/off_policy_algorithm.py	Adds n_steps parameter and NStepReplayBuffer integration
sbx/common/type_aliases.py	Extends ReplayBufferSamplesNp to include discounts field
sbx/sac/sac.py	Updates SAC algorithm to support n-step returns with discount handling
sbx/td3/td3.py	Updates TD3 algorithm to support n-step returns with discount handling
sbx/tqc/tqc.py	Updates TQC algorithm to support n-step returns with discount handling
sbx/crossq/crossq.py	Updates CrossQ algorithm to support n-step returns with discount handling
sbx/dqn/dqn.py	Updates DQN algorithm to support n-step returns with discount handling
sbx/ddpg/ddpg.py	Updates DDPG algorithm to support n-step returns
sbx/ppo/ppo.py	Refactors logging to separate policy and entropy losses
sbx/ppo/policies.py	Updates import statements and type annotations
setup.py	Updates dependency versions for stable_baselines3, jax, and black
sbx/version.txt	Bumps version from 0.21.0 to 0.22.0
tests/test_run.py	Adds n_steps parameter to test configuration

Comments suppressed due to low confidence (3)

setup.py:45

The JAX version constraint <0.7.0 may be incorrect. JAX has not released version 0.7.0 as of January 2025. The latest stable JAX versions are in the 0.4.x series. Consider using a more realistic upper bound like <0.5.0 or removing the upper bound entirely.

        "jax>=0.4.24,<0.7.0",  # tf probability not compatible yet with latest jax version

setup.py:65

Black version 25.1.0 does not exist. As of January 2025, Black's latest versions are in the 24.x series. Consider using a realistic version constraint like "black>=24.2.0,<25".

            "black>=25.1.0,<26",

sbx/ppo/ppo.py:183

The removal of this line creates an unused variable ent_key that was being generated but is now missing. This could cause a NameError if ent_key is used elsewhere in the code, or it might indicate that some entropy-related functionality has been inadvertently removed.

araffin added 9 commits July 25, 2025 10:35

Add support for n-step returns

3c181eb

Add type hint

614f0f5

Cleanup ppo code

c94cfbd

Log policy and entropy loss separately

3eda806

Cleanup vf init

3fe75be

Update version

ac56a96

Add test for n steps

fae0db6

Cap Jax version

eb99830

Reformat

9f4b05f

araffin requested a review from Copilot July 25, 2025 08:53

Copilot AI reviewed Jul 25, 2025

View reviewed changes

araffin merged commit 1e5e433 into master Jul 25, 2025
4 checks passed

araffin deleted the feat/n-steps branch July 25, 2025 09:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add n-step return support with `n_steps` parameter #74

Add n-step return support with `n_steps` parameter #74

Uh oh!

araffin commented Jul 25, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add n-step return support with n_steps parameter #74

Add n-step return support with n_steps parameter #74

Uh oh!

Conversation

araffin commented Jul 25, 2025

Description

Motivation and Context

Types of changes

Checklist:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add n-step return support with `n_steps` parameter #74

Add n-step return support with `n_steps` parameter #74