Update LunarLander and LunarLanderContinuous Environments from v2 to v3 in the Documentation (#2143)

j0m0k0 · araffin · web-flow · commit 6af0601dc3c9 · 2025-06-03T07:08:04.000+02:00
* Update environment versions in the examples documentation

The version 2 of LunarLander and LunarLanderContinuous environments are deprecated by gymnasium.

* Update integrations.rst to support version 3 of LunarLander

* Update changelog.rst to include documentation changes for LunarLander and LunarLanderContinuous env versions

* Update docs/misc/changelog.rst

* Fix for newer mypy version

* Downgrade ale-py for gymnasium&lt;1

---------

Co-authored-by: Antonin RAFFIN &lt;antonin.raffin@ensta.org&gt;
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -49,7 +49,8 @@ jobs:
         run: |
           uv pip install --system gymnasium==${{ matrix.gymnasium-version }}
           uv pip install --system "numpy<2"
-        # Only run for python 3.10, downgrade gym to 0.29.1, numpy<2
+          uv pip install --system "ale-py==0.10.1"
+        # Only run for python 3.10, downgrade gym to 0.29.1, numpy<2, ale-py==0.10.1
         if: matrix.gymnasium-version != '1.0.0'
       - name: Lint with ruff
         run: |
diff --git a/docs/guide/examples.rst b/docs/guide/examples.rst
@@ -71,7 +71,7 @@ In the following example, we will train, save and load a DQN model on the Lunar
 
 
   # Create environment
-  env = gym.make("LunarLander-v2", render_mode="rgb_array")
+  env = gym.make("LunarLander-v3", render_mode="rgb_array")
 
   # Instantiate the agent
   model = DQN("MlpPolicy", env, verbose=1)
@@ -289,7 +289,7 @@ If your callback returns False, training is aborted early.
   os.makedirs(log_dir, exist_ok=True)
 
   # Create and wrap the environment
-  env = gym.make("LunarLanderContinuous-v2")
+  env = gym.make("LunarLanderContinuous-v3")
   env = Monitor(env, log_dir)
 
   # Add some action noise for exploration
@@ -816,7 +816,7 @@ Bonus: Make a GIF of a Trained Agent
 
   from stable_baselines3 import A2C
 
-  model = A2C("MlpPolicy", "LunarLander-v2").learn(100_000)
+  model = A2C("MlpPolicy", "LunarLander-v3").learn(100_000)
 
   images = []
   obs = model.env.reset()
diff --git a/docs/guide/integrations.rst b/docs/guide/integrations.rst
@@ -73,11 +73,11 @@ Installation
 
      # Download model and save it into the logs/ folder
      # Only use TRUST_REMOTE_CODE=True with HF models that can be trusted (here the SB3 organization)
-     TRUST_REMOTE_CODE=True python -m rl_zoo3.load_from_hub --algo a2c --env LunarLander-v2 -orga sb3 -f logs/
+     TRUST_REMOTE_CODE=True python -m rl_zoo3.load_from_hub --algo a2c --env LunarLander-v3 -orga sb3 -f logs/
      # Test the agent
-     python -m rl_zoo3.enjoy --algo a2c --env LunarLander-v2  -f logs/
+     python -m rl_zoo3.enjoy --algo a2c --env LunarLander-v3  -f logs/
      # Push model, config and hyperparameters to the hub
-     python -m rl_zoo3.push_to_hub --algo a2c --env LunarLander-v2 -f logs/ -orga sb3 -m "Initial commit"
+     python -m rl_zoo3.push_to_hub --algo a2c --env LunarLander-v3 -f logs/ -orga sb3 -m "Initial commit"
 
 
 
diff --git a/docs/misc/changelog.rst b/docs/misc/changelog.rst
@@ -37,6 +37,7 @@ Documentation:
 ^^^^^^^^^^^^^^
 - Clarify ``evaluate_policy`` documentation
 - Added doc about training exceeding the `total_timesteps` parameter
+- Updated ``LunarLander`` and ``LunarLanderContinuous`` environment versions to v3 (@j0m0k0)
 
 
 Release 2.6.0 (2025-03-24)
diff --git a/stable_baselines3/common/policies.py b/stable_baselines3/common/policies.py
@@ -381,7 +381,7 @@ def predict(
         # Remove batch dimension if needed
         if not vectorized_env:
             assert isinstance(actions, np.ndarray)
-            actions = actions.squeeze(axis=0)
+            actions = actions.squeeze(axis=0)  # type: ignore[assignment]
 
         return actions, state  # type: ignore[return-value]