Document gSDE inference behavior for PPO, SAC, and A2C #2167

Copilot · 2025-08-27T09:37:10Z

This PR adds documentation clarifying the behavior of gSDE (Generalized State-Dependent Exploration) during inference for models trained with use_sde=True.

Problem

When using models trained with gSDE, users expect that setting deterministic=False in model.predict() would provide stochastic behavior similar to training. However, the automatic noise resetting controlled by sde_sample_freq during training does not occur during inference, resulting in deterministic behavior even when deterministic=False.

Solution

Added informative notes to the documentation of PPO, SAC, and A2C explaining:

Expected behavior: gSDE is primarily designed for training, with deterministic behavior recommended for inference
Current limitation: model.predict() does not automatically reset noise, leading to deterministic behavior even with deterministic=False
Workaround: If stochastic behavior is needed during inference, users must manually call model.policy.reset_noise(env.num_envs) at appropriate intervals

Example of the documented behavior

from stable_baselines3 import PPO

# Training with gSDE
model = PPO("MlpPolicy", "Pendulum-v1", use_sde=True, sde_sample_freq=4)
model.learn(total_timesteps=10000)

vec_env = model.get_env()
obs = vec_env.reset()
# During inference - deterministic behavior (recommended)
action, _ = model.predict(obs, deterministic=True)

# If stochastic behavior is needed during inference
# Manual noise resetting is required
step_count = 0
while True:
    if step_count % 4 == 0:  # Reset every 4 steps (matching sde_sample_freq)
        model.policy.reset_noise(vec_env.num_envs)
    
    action, _ = model.predict(obs, deterministic=False)
    obs, reward, done, info = vec_env.step(action)
    step_count += 1

The documentation changes are minimal and focused, adding clear guidance without modifying any code behavior. This addresses the user's question and provides practical guidance for proper gSDE usage during inference.

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: araffin <[email protected]>

araffin · 2025-08-27T09:49:34Z

@copilot please update the changelog too

Co-authored-by: araffin <[email protected]>

Copilot · 2025-08-27T09:54:15Z

@copilot please update the changelog too

Updated the changelog with documentation entry for gSDE inference behavior clarification (commit 5a3bf4a)

Initial plan

58049db

Copilot AI assigned Copilot and araffin Aug 27, 2025

Copilot started work on behalf of araffin August 27, 2025 09:37 View session

Add gSDE inference behavior documentation to PPO, SAC, and A2C

ae34108

Co-authored-by: araffin <[email protected]>

Copilot AI changed the title ~~[WIP] I would like to fix issue #1767 by updating the documentation~~ Document gSDE inference behavior for PPO, SAC, and A2C Aug 27, 2025

Copilot AI requested a review from araffin August 27, 2025 09:47

Copilot finished work on behalf of araffin August 27, 2025 09:47

Copilot started work on behalf of araffin August 27, 2025 09:49 View session

Update changelog with gSDE documentation entry

5a3bf4a

Co-authored-by: araffin <[email protected]>

Copilot finished work on behalf of araffin August 27, 2025 09:54

araffin marked this pull request as ready for review August 27, 2025 10:13

Add link to PR

5f3ed27

araffin approved these changes Aug 27, 2025

View reviewed changes

araffin merged commit 3b249c0 into master Aug 27, 2025
4 checks passed

araffin deleted the copilot/fix-2d05eea8-03ea-4b3d-a808-4bb4c5a804d4 branch August 27, 2025 10:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Document gSDE inference behavior for PPO, SAC, and A2C #2167

Document gSDE inference behavior for PPO, SAC, and A2C #2167

Copilot AI commented Aug 27, 2025 •

edited by araffin

Loading

Uh oh!

araffin commented Aug 27, 2025

Uh oh!

Copilot AI commented Aug 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Document gSDE inference behavior for PPO, SAC, and A2C #2167

Document gSDE inference behavior for PPO, SAC, and A2C #2167

Conversation

Copilot AI commented Aug 27, 2025 • edited by araffin Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Example of the documented behavior

Uh oh!

araffin commented Aug 27, 2025

Uh oh!

Copilot AI commented Aug 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Aug 27, 2025 •

edited by araffin

Loading