Skip to content

Commit 8cd8c62

Browse files
Copilotaraffin
andauthored
Document Atari wrapper reset behavior (#2170)
* Initial plan * Document Atari wrapper reset behavior and workaround for issue #666 Co-authored-by: araffin <[email protected]> * Move note to examples --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: araffin <[email protected]> Co-authored-by: Antonin RAFFIN <[email protected]>
1 parent f7a89e1 commit 8cd8c62

File tree

4 files changed

+35
-2
lines changed

4 files changed

+35
-2
lines changed

docs/guide/examples.rst

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -368,7 +368,25 @@ and multiprocessing for you. To install the Atari environments, run the command
368368

369369
.. image:: ../_static/img/colab-badge.svg
370370
:target: https://colab.research.google.com/github/Stable-Baselines-Team/rl-colab-notebooks/blob/sb3/atari_games.ipynb
371-
..
371+
372+
.. note::
373+
374+
When working with Atari environments, be aware that the default ``terminal_on_life_loss=True`` behavior
375+
can cause ``env.reset()`` to perform a no-op step instead of truly resetting the environment when
376+
the episode ends due to a life loss (not game over, see `issue #666 <https://github.com/DLR-RM/stable-baselines3/issues/666>`_).
377+
To ensure ``reset()`` always resets the environment, use:
378+
379+
.. code-block:: python
380+
381+
from stable_baselines3.common.env_util import make_atari_env
382+
383+
import ale_py
384+
385+
env = make_atari_env(
386+
"BreakoutNoFrameskip-v4",
387+
n_envs=1,
388+
wrapper_kwargs=dict(terminal_on_life_loss=False)
389+
)
372390
373391
.. code-block:: python
374392

docs/misc/changelog.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ Documentation:
3535
^^^^^^^^^^^^^^
3636
- Added plotting documentation and examples
3737
- Added documentation clarifying gSDE (Generalized State-Dependent Exploration) inference behavior for PPO, SAC, and A2C algorithms
38+
- Documented Atari wrapper reset behavior where ``env.reset()`` may perform a no-op step instead of truly resetting when ``terminal_on_life_loss=True`` (default), and how to avoid this behavior by setting ``terminal_on_life_loss=False``
3839
- Clarified comment in ``_sample_action()`` method to better explain action scaling behavior for off-policy algorithms (@copilot)
3940

4041

stable_baselines3/common/atari_wrappers.py

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -100,6 +100,14 @@ class EpisodicLifeEnv(gym.Wrapper[np.ndarray, int, np.ndarray, int]):
100100
Make end-of-life == end-of-episode, but only reset on true game over.
101101
Done by DeepMind for the DQN and co. since it helps value estimation.
102102
103+
.. note::
104+
This wrapper changes the behavior of ``env.reset()``. When the environment
105+
terminates due to a loss of life (but not game over), calling ``reset()`` will
106+
perform a no-op step instead of truly resetting the environment. This can be
107+
confusing when evaluating or testing agents. To avoid this behavior and ensure ``reset()``
108+
always resets to the env, set ``terminal_on_life_loss=False`` when
109+
using ``make_atari_env()``.
110+
103111
:param env: Environment to wrap
104112
"""
105113

@@ -273,7 +281,7 @@ class AtariWrapper(gym.Wrapper[np.ndarray, int, np.ndarray, int]):
273281
:param frame_skip: Frequency at which the agent experiences the game.
274282
This correspond to repeating the action ``frame_skip`` times.
275283
:param screen_size: Resize Atari frame
276-
:param terminal_on_life_loss: If True, then step() returns done=True whenever a life is lost.
284+
:param terminal_on_life_loss: If True, then step() returns terminated=True whenever a life is lost.
277285
:param clip_reward: If True (default), the reward is clip to {-1, 0, 1} depending on its sign.
278286
:param action_repeat_probability: Probability of repeating the last action
279287
"""

stable_baselines3/common/env_util.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -144,6 +144,12 @@ def make_atari_env(
144144
Create a wrapped, monitored VecEnv for Atari.
145145
It is a wrapper around ``make_vec_env`` that includes common preprocessing for Atari games.
146146
147+
.. note::
148+
By default, the ``AtariWrapper`` uses ``terminal_on_life_loss=True``, which causes
149+
``env.reset()`` to perform a no-op step instead of truly resetting when the environment
150+
terminates due to a loss of life (but not game over). To ensure ``reset()`` always
151+
resets the env, pass ``wrapper_kwargs=dict(terminal_on_life_loss=False)``.
152+
147153
:param env_id: either the env ID, the env class or a callable returning an env
148154
:param n_envs: the number of environments you wish to have in parallel
149155
:param seed: the initial seed for the random number generator

0 commit comments

Comments
 (0)