Skip to content

Commit 5f3ed27

Browse files
committed
Add link to PR
1 parent 5a3bf4a commit 5f3ed27

File tree

3 files changed

+3
-3
lines changed

3 files changed

+3
-3
lines changed

docs/modules/a2c.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -93,7 +93,7 @@ Train a A2C agent on ``CartPole-v1`` using 4 environments.
9393

9494
.. note::
9595

96-
**Using gSDE (Generalized State-Dependent Exploration) during inference:**
96+
Using gSDE (Generalized State-Dependent Exploration) during inference (see `PR #1767 <https://github.com/DLR-RM/stable-baselines3/pull/1767>`_):
9797

9898
When using A2C models trained with ``use_sde=True``, the automatic noise resetting that occurs during training (controlled by ``sde_sample_freq``) does not happen when using ``model.predict()`` for inference. This results in deterministic behavior even when ``deterministic=False``.
9999

docs/modules/ppo.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -107,7 +107,7 @@ Train a PPO agent on ``CartPole-v1`` using 4 environments.
107107

108108
.. note::
109109

110-
**Using gSDE (Generalized State-Dependent Exploration) during inference:**
110+
Using gSDE (Generalized State-Dependent Exploration) during inference (see `PR #1767 <https://github.com/DLR-RM/stable-baselines3/pull/1767>`_):
111111

112112
When using PPO models trained with ``use_sde=True``, the automatic noise resetting that occurs during training (controlled by ``sde_sample_freq``) does not happen when using ``model.predict()`` for inference. This results in deterministic behavior even when ``deterministic=False``.
113113

docs/modules/sac.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -95,7 +95,7 @@ This example is only to demonstrate the use of the library and its functions, an
9595
9696
.. note::
9797

98-
**Using gSDE (Generalized State-Dependent Exploration) during inference:**
98+
Using gSDE (Generalized State-Dependent Exploration) during inference (see `PR #1767 <https://github.com/DLR-RM/stable-baselines3/pull/1767>`_):
9999

100100
When using SAC models trained with ``use_sde=True``, the automatic noise resetting that occurs during training (controlled by ``sde_sample_freq``) does not happen when using ``model.predict()`` for inference. This results in deterministic behavior even when ``deterministic=False``.
101101

0 commit comments

Comments
 (0)