Skip to content

Releases: DLR-RM/stable-baselines3

v2.7.0: n-step returns for all off-policy algorithms via the `n_steps` argument

25 Jul 09:55
bf51a62
Compare
Choose a tag to compare

SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo
Stable-Baselines Jax (SBX): https://github.com/araffin/sbx

To upgrade:

pip install stable_baselines3 sb3_contrib rl_zoo3 --upgrade

New Features:

  • Added support for n-step returns for off-policy algorithms via the n_steps parameter
from stable_baselines3 import SAC

# SAC with n-step returns
model = SAC("MlpPolicy", "Pendulum-v1", n_steps=3, verbose=1)
model.learn(10_000)
  • Added NStepReplayBuffer that allows to compute n-step returns without additional memory requirement (and without for loops)
  • Added Gymnasium v1.2 support

Bug Fixes:

  • Fixed docker GPU image (PyTorch GPU was not installed)
  • Fixed segmentation faults caused by non-portable schedules during model loading (@akanto)

SB3-Contrib

  • Added support for n-step returns for off-policy algorithms via the n_steps parameter
  • Use the FloatSchedule and LinearSchedule classes instead of lambdas in the ARS, PPO, and QRDQN implementations to improve model portability across different operating systems

RL Zoo

  • linear_schedule now returns a SimpleLinearSchedule object for better portability
  • Renamed LunarLander-v2 to LunarLander-v3 in hyperparameters
  • Renamed CarRacing-v2 to CarRacing-v3 in hyperparameters
  • Docker GPU images are now working again
  • Use ConstantSchedule, and SimpleLinearSchedule instead of constant_fn and linear_schedule
  • Fixed CarRacing-v3 hyperparameters for newer Gymnasium version

SBX (SB3 + Jax)

  • Added support for n-step returns for off-policy algorithms via the n_steps parameter
  • Added KL Adaptive LR for PPO and LR schedule for SAC/TQC

Deprecations:

  • get_schedule_fn(), get_linear_fn(), constant_fn() are deprecated, please use FloatSchedule(), LinearSchedule(), ConstantSchedule() instead

Documentation:

  • Clarify evaluate_policy documentation
  • Added doc about training exceeding the total_timesteps parameter
  • Updated LunarLander and LunarLanderContinuous environment versions to v3 (@j0m0k0)
  • Added sb3-extra-buffers to the project page (@Trenza1ore)

New Contributors

Full Changelog: v2.6.0...v2.7.0

v2.6.0: New `LogEveryNTimesteps` callback and `has_attr` method, refactored hyperparameter optimization

24 Mar 15:01
ea913a8
Compare
Choose a tag to compare

SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo
Stable-Baselines Jax (SBX): https://github.com/araffin/sbx

To upgrade:

pip install stable_baselines3 sb3_contrib rl_zoo3 --upgrade

New Features:

  • Added has_attr method for VecEnv to check if an attribute exists
  • Added LogEveryNTimesteps callback to dump logs every N timesteps (note: you need to pass log_interval=None to avoid any interference)
  • Added Gymnasium v1.1 support

Bug fixes:

  • SubProcVecEnv will now exit gracefully (without big traceback) when using KeyboardInterrupt

SB3-Contrib

  • Renamed _dump_logs() to dump_logs()
  • Fixed issues with SubprocVecEnv and MaskablePPO by using vec_env.has_attr() (pickling issues, mask function not present)

RL Zoo

  • Refactored hyperparameter optimization. The Optuna Journal storage backend is now supported (recommended default) and you can easily load tuned hyperparameter via the new --trial-id argument of train.py.
  • Save the exact command line used to launch a training
  • Added support for special vectorized env (e.g. Brax, IsaacSim) by allowing to override the VecEnv class use to instantiate the env in the ExperimentManager
  • Allow to disable auto-logging by passing --log-interval -2 (useful when logging things manually)
  • Added Gymnasium v1.1 support
  • Fixed use of old HF api in get_hf_trained_models()

SBX (SB3 + Jax)

  • Updated PPO to support net_arch, and additional fixes
  • Fixed entropy coeff wrongly logged for SAC and derivatives.
  • Fixed PPO predict() for env that were not normalized (action spaces with limits != [-1, 1])
  • PPO now logs the standard deviation

Deprecations:

  • algo._dump_logs() is deprecated in favor of algo.dump_logs() and will be removed in SB3 v2.7.0

Others:

  • Updated black from v24 to v25
  • Improved error messages when checking Box space equality (loading VecNormalize)
  • Updated test to reflect how set_wrapper_attr should be used now

Documentation:

  • Clarify the use of Gym wrappers with make_vec_env in the section on Vectorized Environments (@pstahlhofen)
  • Updated callback doc for EveryNTimesteps
  • Added doc on how to set env attributes via VecEnv calls
  • Added ONNX export example for MultiInputPolicy (@darkopetrovic)

New Contributors

Full Changelog: v2.5.0...v2.6.0

v2.5.0: New algorithm (SimBa in SBX) and NumPy 2.0 support

27 Jan 12:30
ee8a77d
Compare
Choose a tag to compare

SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo
Stable-Baselines Jax (SBX): https://github.com/araffin/sbx

To upgrade:

pip install stable_baselines3 sb3_contrib rl_zoo3 --upgrade

Breaking Changes:

  • Increased minimum required version of PyTorch to 2.3.0
  • Removed support for Python 3.8

New Features:

  • Added support for NumPy v2.0: VecNormalize now cast normalized rewards to float32, updated bit flipping env to avoid overflow issues too
  • Added official support for Python 3.12

SBX (SB3 + Jax)

  • Added SimBa Policy: Simplicity Bias for Scaling Up Parameters in DRL
  • Added support for parameter resets

Others:

  • Updated Dockerfile

Documentation:

  • Added Decisions and Dragons to resources. (@jmacglashan)
  • Updated PyBullet example, now compatible with Gymnasium
  • Added link to policies for policy_kwargs parameter (@kplers)
  • Add FootstepNet Envs to the project page (@cgaspard3333)
  • Added FRASA to the project page (@MarcDcls)
  • Fixed atari example (@chrisgao99)
  • Add a note about Discrete action spaces with start!=0
  • Update doc for massively parallel simulators (Isaac Lab, Brax, ...)
  • Add dm_control example

New Contributors

Full Changelog: v2.4.0...v2.5.0

Stable-Baselines3 v2.4.1: Fix for `VecVideoRecorder`

07 Jan 13:25
Compare
Choose a tag to compare

Bug Fixes

  • Fixed a bug introduced in v2.4.0 where the VecVideoRecorder would override videos

Full Changelog: v2.4.0...v2.4.1

Stable-Baselines3 v2.4.0: New algorithm (CrossQ in SB3-Contrib) and Gymnasium v1.0 support

18 Nov 10:33
020ee42
Compare
Choose a tag to compare

Warning

Stable-Baselines3 (SB3) v2.4.0 will be the last one supporting Python 3.8 (end of life in October 2024)
and PyTorch < 2.3.
We highly recommended you to upgrade to Python >= 3.9 and PyTorch >= 2.3 (compatible with NumPy v2).

SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo
Stable-Baselines Jax (SBX): https://github.com/araffin/sbx

To upgrade:

pip install stable_baselines3 sb3_contrib rl_zoo3 --upgrade

Note

DQN (and QR-DQN) models saved with SB3 < 2.4.0 will show a warning about truncation of optimizer state when loaded with SB3 >= 2.4.0.
To suppress the warning, simply save the model again.
You can find more info in PR #1963

Breaking Changes:

  • Increased minimum required version of Gymnasium to 0.29.1

New Features:

  • Added support for pre_linear_modules and post_linear_modules in create_mlp (useful for adding normalization layers, like in DroQ or CrossQ)
  • Enabled np.ndarray logging for TensorBoardOutputFormat as histogram (see GH#1634) (@iwishwasaneagle)
  • Updated env checker to warn users when using multi-dim array to define MultiDiscrete spaces
  • Added support for Gymnasium v1.0

Bug Fixes:

  • Fixed memory leak when loading learner from storage, set_parameters() does not try to load the object data anymore
    and only loads the PyTorch parameters (@peteole)
  • Cast type in compute gae method to avoid error when using torch compile (@amjames)
  • CallbackList now sets the .parent attribute of child callbacks to its own .parent. (will-maclean)
  • Fixed error when loading a model that has net_arch manually set to None (@jak3122)
  • Set requirement numpy<2.0 until PyTorch is compatible (pytorch/pytorch#107302)
  • Updated DQN optimizer input to only include q_network parameters, removing the target_q_network ones (@corentinlger)
  • Fixed test_buffers.py::test_device which was not actually checking the device of tensors (@rhaps0dy)

SB3-Contrib

  • Added CrossQ algorithm, from "Batch Normalization in Deep Reinforcement Learning" paper (@danielpalen)
  • Added BatchRenorm PyTorch layer used in CrossQ (@danielpalen)
  • Updated QR-DQN optimizer input to only include quantile_net parameters (@corentinlger)
  • Fixed loading QRDQN changes target_update_interval (@jak3122)

RL Zoo

  • Updated defaults hyperparameters for TQC/SAC for Swimmer-v4 (decrease gamma for more consistent results)

SBX (SB3 + Jax)

  • Added CNN support for DQN
  • Bug fix for SAC and related algorithms, optimize log of ent coeff to be consistent with SB3

Others:

  • Fixed various typos (@cschindlbeck)
  • Remove unnecessary SDE noise resampling in PPO update (@brn-dev)
  • Updated PyTorch version on CI to 2.3.1
  • Added a warning to recommend using CPU with on policy algorithms (A2C/PPO) and MlpPolicy
  • Switched to uv to download packages faster on GitHub CI
  • Updated dependencies for read the doc
  • Removed unnecessary copy_obs_dict method for SubprocVecEnv, remove the use of ordered dict and rename flatten_obs to stack_obs

Documentation:

  • Updated PPO doc to recommend using CPU with MlpPolicy
  • Clarified documentation about planned features and citing software
  • Added a note about the fact we are optimizing log of ent coeff for SAC

New Contributors

Full Changelog: v2.3.2...v2.4.0

Stable-Baselines3 v2.3.2: Hotfix for PyTorch 1.13

27 Apr 13:11
285e01f
Compare
Choose a tag to compare

Bug fixes

  • Reverted torch.load() to be called weights_only=False as it caused loading issue with old version of PyTorch. #1913
  • Cast learning_rate to float lambda for pickle safety when doing model.load by @markscsmith in #1901

Documentation

New Contributors

Full Changelog: v2.3.0...v2.3.2

Stable-Baselines3 v2.3.0: New defaults hyperparameters for DDPG, TD3 and DQN

31 Mar 18:33
429be93
Compare
Choose a tag to compare

Warning

Because of weights_only=True, this release breaks loading of policies when using PyTorch 1.13.
Please upgrade to PyTorch >= 2.0 or upgrade SB3 version (we reverted the change in SB3 2.3.2)

SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo
Stable-Baselines Jax (SBX): https://github.com/araffin/sbx

To upgrade:

pip install stable_baselines3 sb3_contrib --upgrade

or simply (rl zoo depends on SB3 and SB3 contrib):

pip install rl_zoo3 --upgrade

Breaking Changes:

  • The defaults hyperparameters of TD3 and DDPG have been changed to be more consistent with SAC
  # SB3 < 2.3.0 default hyperparameters
  # model = TD3("MlpPolicy", env, train_freq=(1, "episode"), gradient_steps=-1, batch_size=100)
  # SB3 >= 2.3.0:
  model = TD3("MlpPolicy", env, train_freq=1, gradient_steps=1, batch_size=256)

Note

Two inconsistencies remain: the default network architecture for TD3/DDPG is [400, 300] instead of [256, 256] for SAC (for backward compatibility reasons, see report on the influence of the network size ) and the default learning rate is 1e-3 instead of 3e-4 for SAC (for performance reasons, see W&B report on the influence of the lr )

  • The default learning_starts parameter of DQN have been changed to be consistent with the other offpolicy algorithms
  # SB3 < 2.3.0 default hyperparameters, 50_000 corresponded to Atari defaults hyperparameters
  # model = DQN("MlpPolicy", env, learning_starts=50_000)
  # SB3 >= 2.3.0:
  model = DQN("MlpPolicy", env, learning_starts=100)
  • For safety, torch.load() is now called with weights_only=True when loading torch tensors,
    policy load() still uses weights_only=False as gymnasium imports are required for it to work
  • When using huggingface_sb3, you will now need to set TRUST_REMOTE_CODE=True when downloading models from the hub, as pickle.load is not safe.

New Features:

  • Log success rate rollout/success_rate when available for on policy algorithms (@corentinlger)

Bug Fixes:

  • Fixed monitor_wrapper argument that was not passed to the parent class, and dones argument that wasn't passed to _update_into_buffer (@corentinlger)

SB3-Contrib

  • Added rollout_buffer_class and rollout_buffer_kwargs arguments to MaskablePPO
  • Fixed train_freq type annotation for tqc and qrdqn (@Armandpl)
  • Fixed sb3_contrib/common/maskable/*.py type annotations
  • Fixed sb3_contrib/ppo_mask/ppo_mask.py type annotations
  • Fixed sb3_contrib/common/vec_env/async_eval.py type annotations
  • Add some additional notes about MaskablePPO (evaluation and multi-process) (@icheered)

RL Zoo

  • Updated defaults hyperparameters for TD3/DDPG to be more consistent with SAC
  • Upgraded MuJoCo envs hyperparameters to v4 (pre-trained agents need to be updated)
  • Added test dependencies to setup.py (@power-edge)
  • Simplify dependencies of requirements.txt (remove duplicates from setup.py)

SBX (SB3 + Jax)

  • Added support for MultiDiscrete and MultiBinary action spaces to PPO
  • Added support for large values for gradient_steps to SAC, TD3, and TQC
  • Fix train() signature and update type hints
  • Fix replay buffer device at load time
  • Added flatten layer
  • Added CrossQ

Others:

  • Updated black from v23 to v24
  • Updated ruff to >= v0.3.1
  • Updated env checker for (multi)discrete spaces with non-zero start.

Documentation:

  • Added a paragraph on modifying vectorized environment parameters via setters (@fracapuano)
  • Updated callback code example
  • Updated export to ONNX documentation, it is now much simpler to export SB3 models with newer ONNX Opset!
  • Added video link to "Practical Tips for Reliable Reinforcement Learning" video
  • Added render_mode="human" in the README example (@marekm4)
  • Fixed docstring signature for sum_independent_dims (@StagOverflow)
  • Updated docstring description for log_interval in the base class (@rushitnshah).

Full Changelog: v2.2.1...v2.3.0

Stable-Baselines3 v2.2.1: Support for options at reset, bug fixes and better error messages

17 Nov 23:35
e3dea4b
Compare
Choose a tag to compare

SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo
Stable-Baselines Jax (SBX): https://github.com/araffin/sbx

To upgrade:

pip install stable_baselines3 sb3_contrib --upgrade

or simply (rl zoo depends on SB3 and SB3 contrib):

pip install rl_zoo3 --upgrade

Note

Stable-Baselines3 (SB3) v2.2.0 was yanked after a breaking change was found in GH#1751.
Please use SB3 v2.2.1 and not v2.2.0.

Breaking Changes:

  • Switched to ruff for sorting imports (isort is no longer needed), black and ruff version now require a minimum version
  • Dropped x is False in favor of not x, which means that callbacks that wrongly returned None (instead of a boolean) will cause the training to stop (@iwishiwasaneagle)

New Features:

  • Improved error message of the env_checker for env wrongly detected as GoalEnv (compute_reward() is defined)
  • Improved error message when mixing Gym API with VecEnv API (see GH#1694)
  • Add support for setting options at reset with VecEnv via the set_options() method. Same as seeds logic, options are reset at the end of an episode (@ReHoss)
  • Added rollout_buffer_class and rollout_buffer_kwargs arguments to on-policy algorithms (A2C and PPO)

Bug Fixes:

  • Prevents using squash_output and not use_sde in ActorCritcPolicy (@PatrickHelm)
  • Performs unscaling of actions in collect_rollout in OnPolicyAlgorithm (@PatrickHelm)
  • Moves VectorizedActionNoise into _setup_learn() in OffPolicyAlgorithm (@PatrickHelm)
  • Prevents out of bound error on Windows if no seed is passed (@PatrickHelm)
  • Calls callback.update_locals() before callback.on_rollout_end() in OnPolicyAlgorithm (@PatrickHelm)
  • Fixed replay buffer device after loading in OffPolicyAlgorithm (@PatrickHelm)
  • Fixed render_mode which was not properly loaded when using VecNormalize.load()
  • Fixed success reward dtype in SimpleMultiObsEnv (@NixGD)
  • Fixed check_env for Sequence observation space (@corentinlger)
  • Prevents instantiating BitFlippingEnv with conflicting observation spaces (@kylesayrs)
  • Fixed ResourceWarning when loading and saving models (files were not closed), please note that only path are closed automatically,
    the behavior stay the same for tempfiles (they need to be closed manually),
    the behavior is now consistent when loading/saving replay buffer

SB3-Contrib

  • Added set_options for AsyncEval
  • Added rollout_buffer_class and rollout_buffer_kwargs arguments to TRPO

RL Zoo

  • Removed gym dependency, the package is still required for some pretrained agents.
  • Added --eval-env-kwargs to train.py (@Quentin18)
  • Added ppo_lstm to hyperparams_opt.py (@technocrat13)
  • Upgraded to pybullet_envs_gymnasium>=0.4.0
  • Removed old hacks (for instance limiting offpolicy algorithms to one env at test time)
  • Updated docker image, removed support for X server
  • Replaced deprecated optuna.suggest_uniform(...) by optuna.suggest_float(..., low=..., high=...)

SBX (SB3 + Jax)

  • Added DDPG and TD3 algorithms

Others:

  • Fixed stable_baselines3/common/callbacks.py type hints
  • Fixed stable_baselines3/common/utils.py type hints
  • Fixed stable_baselines3/common/vec_envs/vec_transpose.py type hints
  • Fixed stable_baselines3/common/vec_env/vec_video_recorder.py type hints
  • Fixed stable_baselines3/common/save_util.py type hints
  • Updated docker images to Ubuntu Jammy using micromamba 1.5
  • Fixed stable_baselines3/common/buffers.py type hints
  • Fixed stable_baselines3/her/her_replay_buffer.py type hints
  • Buffers do no call an additional .copy() when storing new transitions
  • Fixed ActorCriticPolicy.extract_features() signature by adding an optional features_extractor argument
  • Update dependencies (accept newer Shimmy/Sphinx version and remove sphinx_autodoc_typehints)
  • Fixed stable_baselines3/common/off_policy_algorithm.py type hints
  • Fixed stable_baselines3/common/distributions.py type hints
  • Fixed stable_baselines3/common/vec_env/vec_normalize.py type hints
  • Fixed stable_baselines3/common/vec_env/__init__.py type hints
  • Switched to PyTorch 2.1.0 in the CI (fixes type annotations)
  • Fixed stable_baselines3/common/policies.py type hints
  • Switched to mypy only for checking types
  • Added tests to check consistency when saving/loading files

Documentation:

  • Updated RL Tips and Tricks (include recommendation for evaluation, added links to DroQ, ARS and SBX).
  • Fixed various typos and grammar mistakes

Full changelog: v2.1.0...v2.2.1

Stable-Baselines3 v2.1.0: Float64 actions, Gymnasium 0.29 support and bug fixes

20 Aug 12:13
f4ec0f6
Compare
Choose a tag to compare

SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo
Stable-Baselines Jax (SBX): https://github.com/araffin/sbx

To upgrade:

pip install stable_baselines3 sb3_contrib --upgrade

or simply (rl zoo depends on SB3 and SB3 contrib):

pip install rl_zoo3 --upgrade

Breaking Changes:

  • Removed Python 3.7 support
  • SB3 now requires PyTorch >= 1.13

New Features:

SB3-Contrib

  • Fixed MaskablePPO ignoring stats_window_size argument
  • Added Python 3.11 support

RL Zoo

  • Upgraded to Huggingface-SB3 >= 2.3
  • Added Python 3.11 support

Bug Fixes:

  • Relaxed check in logger, that was causing issue on Windows with colorama
  • Fixed off-policy algorithms with continuous float64 actions (see #1145) (@tobirohrer)
  • Fixed env_checker.py warning messages for out of bounds in complex observation spaces (@Gabo-Tor)

Others:

  • Updated GitHub issue templates
  • Fix typo in gym patch error message (@lukashass)
  • Refactor test_spaces.py tests

Documentation:

Full Changelog: v2.0.0...v2.1.0

Stable-Baselines3 v2.0.0: Gymnasium Support

23 Jun 13:00
472ff8e
Compare
Choose a tag to compare

Warning

Stable-Baselines3 (SB3) v2.0 will be the last one supporting python 3.7 (end of life in June 2023).
We highly recommended you to upgrade to Python >= 3.8.

SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo
Stable-Baselines Jax (SBX): https://github.com/araffin/sbx

To upgrade:

pip install stable_baselines3 sb3_contrib rl_zoo3 --upgrade

or simply (rl zoo depends on SB3 and SB3 contrib):

pip install rl_zoo3 --upgrade

Breaking Changes:

  • Switched to Gymnasium as primary backend, Gym 0.21 and 0.26 are still supported via the shimmy package (@carlosluis, @arjun-kg, @tlpss)
  • The deprecated online_sampling argument of HerReplayBuffer was removed
  • Removed deprecated stack_observation_space method of StackedObservations
  • Renamed environment output observations in evaluate_policy to prevent shadowing the input observations during callbacks (@npit)
  • Upgraded wrappers and custom environment to Gymnasium
  • Refined the HumanOutputFormat file check: now it verifies if the object is an instance of io.TextIOBase instead of only checking for the presence of a write method.
  • Because of new Gym API (0.26+), the random seed passed to vec_env.seed(seed=seed) will only be effective after then env.reset() call.

New Features:

  • Added Gymnasium support (Gym 0.21 and 0.26 are supported via the shimmy package)

SB3-Contrib

  • Fixed QRDQN update interval for multi envs

RL Zoo

  • Gym 0.26+ patches to continue working with pybullet and TimeLimit wrapper
  • Renamed CarRacing-v1 to CarRacing-v2 in hyperparameters
  • Huggingface push to hub now accepts a --n-timesteps argument to adjust the length of the video
  • Fixed record_video steps (before it was stepping in a closed env)
  • Dropped Gym 0.21 support

Bug Fixes:

  • Fixed VecExtractDictObs does not handle terminal observation (@WeberSamuel)
  • Set NumPy version to >=1.20 due to use of numpy.typing (@troiganto)
  • Fixed loading DQN changes target_update_interval (@tobirohrer)
  • Fixed env checker to properly reset the env before calling step() when checking
    for Inf and NaN (@lutogniew)
  • Fixed HER truncate_last_trajectory() (@lbergmann1)
  • Fixed HER desired and achieved goal order in reward computation (@JonathanKuelz)

Others:

  • Fixed stable_baselines3/a2c/*.py type hints
  • Fixed stable_baselines3/ppo/*.py type hints
  • Fixed stable_baselines3/sac/*.py type hints
  • Fixed stable_baselines3/td3/*.py type hints
  • Fixed stable_baselines3/common/base_class.py type hints
  • Fixed stable_baselines3/common/logger.py type hints
  • Fixed stable_baselines3/common/envs/*.py type hints
  • Fixed stable_baselines3/common/vec_env/vec_monitor|vec_extract_dict_obs|util.py type hints
  • Fixed stable_baselines3/common/vec_env/base_vec_env.py type hints
  • Fixed stable_baselines3/common/vec_env/vec_frame_stack.py type hints
  • Fixed stable_baselines3/common/vec_env/dummy_vec_env.py type hints
  • Fixed stable_baselines3/common/vec_env/subproc_vec_env.py type hints
  • Upgraded docker images to use mamba/micromamba and CUDA 11.7
  • Updated env checker to reflect what subset of Gymnasium is supported and improve GoalEnv checks
  • Improve type annotation of wrappers
  • Tests envs are now checked too
  • Added render test for VecEnv and VecEnvWrapper
  • Update issue templates and env info saved with the model
  • Changed seed() method return type from List to Sequence
  • Updated env checker doc and requirements for tuple spaces/goal envs

Documentation:

  • Added Deep RL Course link to the Deep RL Resources page
  • Added documentation about VecEnv API vs Gym API
  • Upgraded tutorials to Gymnasium API
  • Make it more explicit when using VecEnv vs Gym env
  • Added UAV_Navigation_DRL_AirSim to the project page (@heleidsn)
  • Added EvalCallback example (@sidney-tio)
  • Update custom env documentation
  • Added pink-noise-rl to projects page
  • Fix custom policy example, ortho_init was ignored
  • Added SBX page

Full Changelog: v1.8.0...v2.0.0