Add gSDE inference behavior documentation to PPO, SAC, and A2C

Copilot · araffin · Copilot · commit ae341083bfe8 · 2025-08-27T09:45:23.000Z
Co-authored-by: araffin &lt;1973948+araffin@users.noreply.github.com&gt;
diff --git a/docs/modules/a2c.rst b/docs/modules/a2c.rst
@@ -91,6 +91,14 @@ Train a A2C agent on ``CartPole-v1`` using 4 environments.
 
   For more information, see :ref:`Vectorized Environments <vec_env>`, `Issue #1245 <https://github.com/DLR-RM/stable-baselines3/issues/1245>`_ or the `Multiprocessing notebook <https://colab.research.google.com/github/Stable-Baselines-Team/rl-colab-notebooks/blob/sb3/multiprocessing_rl.ipynb>`_.
 
+.. note::
+
+  **Using gSDE (Generalized State-Dependent Exploration) during inference:**
+
+  When using A2C models trained with ``use_sde=True``, the automatic noise resetting that occurs during training (controlled by ``sde_sample_freq``) does not happen when using ``model.predict()`` for inference. This results in deterministic behavior even when ``deterministic=False``.
+
+  For continuous control tasks, it is recommended to use deterministic behavior during inference (``deterministic=True``). If you need stochastic behavior during inference, you must manually reset the noise by calling ``model.policy.reset_noise(env.num_envs)`` at appropriate intervals based on your desired ``sde_sample_freq``.
+
 
 Results
 -------
diff --git a/docs/modules/ppo.rst b/docs/modules/ppo.rst
@@ -105,6 +105,14 @@ Train a PPO agent on ``CartPole-v1`` using 4 environments.
 
   For more information, see :ref:`Vectorized Environments <vec_env>`, `Issue #1245 <https://github.com/DLR-RM/stable-baselines3/issues/1245#issuecomment-1435766949>`_ or the `Multiprocessing notebook <https://colab.research.google.com/github/Stable-Baselines-Team/rl-colab-notebooks/blob/sb3/multiprocessing_rl.ipynb>`_.
 
+.. note::
+
+  **Using gSDE (Generalized State-Dependent Exploration) during inference:**
+
+  When using PPO models trained with ``use_sde=True``, the automatic noise resetting that occurs during training (controlled by ``sde_sample_freq``) does not happen when using ``model.predict()`` for inference. This results in deterministic behavior even when ``deterministic=False``.
+
+  For continuous control tasks, it is recommended to use deterministic behavior during inference (``deterministic=True``). If you need stochastic behavior during inference, you must manually reset the noise by calling ``model.policy.reset_noise(env.num_envs)`` at appropriate intervals based on your desired ``sde_sample_freq``.
+
 Results
 -------
 
diff --git a/docs/modules/sac.rst b/docs/modules/sac.rst
@@ -93,6 +93,15 @@ This example is only to demonstrate the use of the library and its functions, an
           obs, info = env.reset()
 
 
+.. note::
+
+  **Using gSDE (Generalized State-Dependent Exploration) during inference:**
+
+  When using SAC models trained with ``use_sde=True``, the automatic noise resetting that occurs during training (controlled by ``sde_sample_freq``) does not happen when using ``model.predict()`` for inference. This results in deterministic behavior even when ``deterministic=False``.
+
+  For continuous control tasks, it is recommended to use deterministic behavior during inference (``deterministic=True``). If you need stochastic behavior during inference, you must manually reset the noise by calling ``model.policy.reset_noise(env.num_envs)`` at appropriate intervals based on your desired ``sde_sample_freq``.
+
+
 Results
 -------