Fix missing references, update changelog

araffin · araffin · commit 03d72d5590b2 · 2024-12-02T12:22:19.000+01:00
diff --git a/docs/misc/changelog.rst b/docs/misc/changelog.rst
@@ -38,6 +38,7 @@ Documentation:
 ^^^^^^^^^^^^^^
 - Added Decisions and Dragons to resources. (@jmacglashan)
 - Updated PyBullet example, now compatible with Gymnasium
+- Added link to policies for ``policy_kwargs`` parameter (@kplers)
 
 Release 2.4.0 (2024-11-18)
 --------------------------
@@ -1738,4 +1739,4 @@ And all the contributors:
 @DavyMorgan @luizapozzobon @Bonifatius94 @theSquaredError @harveybellini @DavyMorgan @FieteO @jonasreiher @npit @WeberSamuel @troiganto
 @lutogniew @lbergmann1 @lukashass @BertrandDecoster @pseudo-rnd-thoughts @stefanbschneider @kyle-he @PatrickHelm @corentinlger
 @marekm4 @stagoverflow @rushitnshah @markscsmith @NickLucche @cschindlbeck @peteole @jak3122 @will-maclean
-@brn-dev @jmacglashan
+@brn-dev @jmacglashan @kplers
diff --git a/docs/modules/a2c.rst b/docs/modules/a2c.rst
@@ -78,7 +78,7 @@ Train a A2C agent on ``CartPole-v1`` using 4 environments.
 
   A2C is meant to be run primarily on the CPU, especially when you are not using a CNN. To improve CPU utilization, try turning off the GPU and using ``SubprocVecEnv`` instead of the default ``DummyVecEnv``:
 
-  .. code-block::
+  .. code-block:: python
 
     from stable_baselines3 import A2C
     from stable_baselines3.common.env_util import make_vec_env
@@ -88,7 +88,7 @@ Train a A2C agent on ``CartPole-v1`` using 4 environments.
         env = make_vec_env("CartPole-v1", n_envs=8, vec_env_cls=SubprocVecEnv)
         model = A2C("MlpPolicy", env, device="cpu")
         model.learn(total_timesteps=25_000)
-  
+
   For more information, see :ref:`Vectorized Environments <vec_env>`, `Issue #1245 <https://github.com/DLR-RM/stable-baselines3/issues/1245>`_ or the `Multiprocessing notebook <https://colab.research.google.com/github/Stable-Baselines-Team/rl-colab-notebooks/blob/sb3/multiprocessing_rl.ipynb>`_.
 
 
@@ -165,6 +165,8 @@ Parameters
   :inherited-members:
 
 
+.. _a2c_policies:
+
 A2C Policies
 -------------
 
diff --git a/docs/modules/ppo.rst b/docs/modules/ppo.rst
@@ -92,7 +92,7 @@ Train a PPO agent on ``CartPole-v1`` using 4 environments.
 
   PPO is meant to be run primarily on the CPU, especially when you are not using a CNN. To improve CPU utilization, try turning off the GPU and using ``SubprocVecEnv`` instead of the default ``DummyVecEnv``:
 
-  .. code-block::
+  .. code-block:: python
 
     from stable_baselines3 import PPO
     from stable_baselines3.common.env_util import make_vec_env
@@ -102,7 +102,7 @@ Train a PPO agent on ``CartPole-v1`` using 4 environments.
         env = make_vec_env("CartPole-v1", n_envs=8, vec_env_cls=SubprocVecEnv)
         model = PPO("MlpPolicy", env, device="cpu")
         model.learn(total_timesteps=25_000)
-  
+
   For more information, see :ref:`Vectorized Environments <vec_env>`, `Issue #1245 <https://github.com/DLR-RM/stable-baselines3/issues/1245#issuecomment-1435766949>`_ or the `Multiprocessing notebook <https://colab.research.google.com/github/Stable-Baselines-Team/rl-colab-notebooks/blob/sb3/multiprocessing_rl.ipynb>`_.
 
 Results
@@ -178,6 +178,8 @@ Parameters
   :inherited-members:
 
 
+.. _ppo_policies:
+
 PPO Policies
 -------------