Skip to content

Commit ae34108

Browse files
Copilotaraffin
andcommitted
Add gSDE inference behavior documentation to PPO, SAC, and A2C
Co-authored-by: araffin <[email protected]>
1 parent 58049db commit ae34108

File tree

3 files changed

+25
-0
lines changed

3 files changed

+25
-0
lines changed

docs/modules/a2c.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,14 @@ Train a A2C agent on ``CartPole-v1`` using 4 environments.
9191
9292
For more information, see :ref:`Vectorized Environments <vec_env>`, `Issue #1245 <https://github.com/DLR-RM/stable-baselines3/issues/1245>`_ or the `Multiprocessing notebook <https://colab.research.google.com/github/Stable-Baselines-Team/rl-colab-notebooks/blob/sb3/multiprocessing_rl.ipynb>`_.
9393

94+
.. note::
95+
96+
**Using gSDE (Generalized State-Dependent Exploration) during inference:**
97+
98+
When using A2C models trained with ``use_sde=True``, the automatic noise resetting that occurs during training (controlled by ``sde_sample_freq``) does not happen when using ``model.predict()`` for inference. This results in deterministic behavior even when ``deterministic=False``.
99+
100+
For continuous control tasks, it is recommended to use deterministic behavior during inference (``deterministic=True``). If you need stochastic behavior during inference, you must manually reset the noise by calling ``model.policy.reset_noise(env.num_envs)`` at appropriate intervals based on your desired ``sde_sample_freq``.
101+
94102

95103
Results
96104
-------

docs/modules/ppo.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -105,6 +105,14 @@ Train a PPO agent on ``CartPole-v1`` using 4 environments.
105105
106106
For more information, see :ref:`Vectorized Environments <vec_env>`, `Issue #1245 <https://github.com/DLR-RM/stable-baselines3/issues/1245#issuecomment-1435766949>`_ or the `Multiprocessing notebook <https://colab.research.google.com/github/Stable-Baselines-Team/rl-colab-notebooks/blob/sb3/multiprocessing_rl.ipynb>`_.
107107

108+
.. note::
109+
110+
**Using gSDE (Generalized State-Dependent Exploration) during inference:**
111+
112+
When using PPO models trained with ``use_sde=True``, the automatic noise resetting that occurs during training (controlled by ``sde_sample_freq``) does not happen when using ``model.predict()`` for inference. This results in deterministic behavior even when ``deterministic=False``.
113+
114+
For continuous control tasks, it is recommended to use deterministic behavior during inference (``deterministic=True``). If you need stochastic behavior during inference, you must manually reset the noise by calling ``model.policy.reset_noise(env.num_envs)`` at appropriate intervals based on your desired ``sde_sample_freq``.
115+
108116
Results
109117
-------
110118

docs/modules/sac.rst

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -93,6 +93,15 @@ This example is only to demonstrate the use of the library and its functions, an
9393
obs, info = env.reset()
9494
9595
96+
.. note::
97+
98+
**Using gSDE (Generalized State-Dependent Exploration) during inference:**
99+
100+
When using SAC models trained with ``use_sde=True``, the automatic noise resetting that occurs during training (controlled by ``sde_sample_freq``) does not happen when using ``model.predict()`` for inference. This results in deterministic behavior even when ``deterministic=False``.
101+
102+
For continuous control tasks, it is recommended to use deterministic behavior during inference (``deterministic=True``). If you need stochastic behavior during inference, you must manually reset the noise by calling ``model.policy.reset_noise(env.num_envs)`` at appropriate intervals based on your desired ``sde_sample_freq``.
103+
104+
96105
Results
97106
-------
98107

0 commit comments

Comments
 (0)