Merge branch 'master' into copilot/fix-2779f837-79a9-4b3c-b3a3-48a6ab7d437f

araffin · web-flow · commit 9c314ef39360 · 2025-11-14T18:04:15.000+01:00
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -33,7 +33,7 @@ cd stable-baselines3/
 2. Install Stable-Baselines3 in develop mode, with support for building the docs and running tests:
 
 ```bash
-pip install -e .[docs,tests,extra]
+pip install -e '.[docs,tests,extra]'
 ```
 
 ## Codestyle
diff --git a/README.md b/README.md
@@ -210,7 +210,7 @@ Actions `gymnasium.spaces`:
 ## Testing the installation
 ### Install dependencies
 ```sh
-pip install -e .[docs,tests,extra]
+pip install -e '.[docs,tests,extra]'
 ```
 ### Run tests
 All unit tests in stable baselines3 can be run using `pytest` runner:
diff --git a/docs/guide/export.rst b/docs/guide/export.rst
@@ -29,16 +29,15 @@ to do inference in another framework.
 
 
 Export to ONNX
------------------
+--------------
 
 
 If you are using PyTorch 2.0+ and ONNX Opset 14+, you can easily export SB3 policies using the following code:
 
 
 .. warning::
 
-  The following returns normalized actions and doesn't include the `post-processing <https://github.com/DLR-RM/stable-baselines3/blob/a9273f968eaf8c6e04302a07d803eebfca6e7e86/stable_baselines3/common/policies.py#L370-L377>`_ step that is done with continuous actions
-  (clip or unscale the action to the correct space).
+  The following returns normalized actions and doesn't include the `post-processing <https://github.com/DLR-RM/stable-baselines3/blob/a9273f968eaf8c6e04302a07d803eebfca6e7e86/stable_baselines3/common/policies.py#L370-L377>`_ step that is done with continuous actions (clip or unscale the action to the correct space).
 
 
 .. code-block:: python
@@ -192,11 +191,159 @@ There is a draft PR in the RL Zoo about C++ export: https://github.com/DLR-RM/rl
   action_jit = loaded_module(dummy_input)
 
 
-Export to tensorflowjs / ONNX-JS
---------------------------------
+Export to ONNX-JS / ONNX Runtime Web
+------------------------------------
 
-TODO: contributors help is welcomed!
-Probably a good starting point: https://github.com/elliotwaite/pytorch-to-javascript-with-onnx-js
+Official documentation: https://onnxruntime.ai/docs/tutorials/web/build-web-app.html
+
+Full example code: https://github.com/JonathanColetti/CarDodgingGym
+
+Demo: https://jonathancoletti.github.io/CarDodgingGym
+
+The code linked above is a complete example (using car dodging environment) that:
+
+1. Creates/Trains a PPO model
+2. Exports the model to ONNX along with normalization stats in JSON
+3. Runs in the browser with normalization using onnxruntime-web to achieve similar results
+
+Below is a simple example with converting to ONNX then inferencing without postprocess in ONNX-JS
+
+.. code-block:: python
+
+  import torch as th
+
+  from stable_baselines3 import SAC
+
+
+  class OnnxablePolicy(th.nn.Module):
+      def __init__(self, actor: th.nn.Module):
+          super().__init__()
+          self.actor = actor
+
+      def forward(self, observation: th.Tensor) -> th.Tensor:
+          # NOTE: You may have to postprocess (unnormalize or renormalize)
+          return self.actor(observation, deterministic=True)
+
+
+  # Example: model = SAC("MlpPolicy", "Pendulum-v1")
+  SAC("MlpPolicy", "Pendulum-v1").save("PathToTrainedModel.zip")
+  model = SAC.load("PathToTrainedModel.zip", device="cpu")
+  onnxable_model = OnnxablePolicy(model.policy.actor)
+
+  observation_size = model.observation_space.shape
+  dummy_input = th.randn(1, *observation_size)
+  th.onnx.export(
+      onnxable_model,
+      dummy_input,
+      "my_sac_actor.onnx",
+      opset_version=17,
+      input_names=["input"],
+  )
+
+.. code-block:: javascript
+
+  // Install using `npm install onnxruntime-web` (tested with version 1.19) or using cdn
+  import * as ort from 'onnxruntime-web';
+
+  async function runInference() {
+    const session = await ort.InferenceSession.create('my_sac_actor.onnx');
+
+    // The observation_size = 3 (for Pendulum-v1)
+    const inputData = Float32Array.from([0.1, -0.2, 0.3]);
+
+    const inputTensor = new ort.Tensor('float32', inputData, [1, 3]);
+
+    const results = await session.run({ input: inputTensor });
+
+    const outputName = session.outputNames[0];
+    const action = results[outputName].data;
+
+    console.log('Predicted action=', action);
+  }
+
+  runInference();
+
+
+Export to TensorFlow.js
+-----------------------
+
+.. warning::
+
+  As of November 2025, `onnx2tf <https://github.com/PINTO0309/onnx2tf>`_ does not support TensorFlow.js. Therefore, `tfjs-converter <https://github.com/tensorflow/tfjs-converter>`_ is used instead. However, tfjs-converter is not currently maintained and requires older opsets and TensorFlow versions.
+
+
+In order for this to work, you have to do multiple conversions: SB3 => ONNX => TensorFlow => TensorFlow.js.
+
+The opset version needs to be changed for the conversion (``opset_version=14`` is currently required). Please refer to the code above for more stable usage with a higher opset.
+
+The following is a simple example that showcases the full conversion + inference.
+
+Please refer to the previous sections for the first step (SB3 => ONNX).
+The main difference is that you need to specify ``opset_version=14``.
+
+.. code-block:: python
+
+  # Tested with python3.10
+  # Then install these dependencies in a fresh env
+  """
+  pip install --use-deprecated=legacy-resolver tensorflow==2.13.0 keras==2.13.1 onnx==1.16.0 onnx-tf==1.9.0 tensorflow-probability==0.21.0 tensorflowjs==4.15.0 jax==0.4.26 jaxlib==0.4.26
+  """
+  # Then run this codeblock
+  # If there are no errors (the folder is structure correctly) then
+  """
+  # tensorflowjs_converter --input_format=tf_saved_model --output_format=tfjs_graph_model tf_model tfjs_model
+  """
+
+  # If you get an error exporting using `tensorflowjs_converter` then upgrade tensorflow
+  """
+  pip install --upgrade tensorflow tensorflow-decision-forests tensorflowjs
+  """
+  # And retry with and it should work (do not rerun this codeblock)
+  """
+  tensorflowjs_converter --input_format=tf_saved_model --output_format=tfjs_graph_model tf_model tfjs_model
+  """
+
+  import onnx
+  import onnx_tf.backend
+  import tensorflow as tf
+
+  ONNX_FILE_PATH = "my_sac_actor.onnx"
+  MODEL_PATH = "tf_model"
+
+  onnx_model = onnx.load(ONNX_FILE_PATH)
+  onnx.checker.check_model(onnx_model)
+  print(onnx.helper.printable_graph(onnx_model.graph))
+
+  print('Converting ONNX to TF...')
+  tf_rep = onnx_tf.backend.prepare(onnx_model)
+  tf_rep.export_graph(MODEL_PATH)
+  # After this do not forget to use `tensorflowjs_converter`
+
+
+.. code-block:: javascript
+
+  import * as tf from 'https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@4.15.0/+esm';
+  // Post processing not included
+  async function runInference() {
+      const MODEL_URL = './tfjs_model/model.json';
+
+      const model = await tf.loadGraphModel(MODEL_URL);
+
+      // Observation_size is 3 for Pendulum-v1
+      const inputData = [1.0, 0.0, 0.0];
+      const inputTensor = tf.tensor2d([inputData], [1, 3]);
+
+      const resultTensor = model.execute(inputTensor);
+
+      const action = await resultTensor.data();
+
+      console.log('Predicted action=', action);
+
+      inputTensor.dispose();
+      resultTensor.dispose();
+  }
+
+  runInference();
 
 
 Export to TFLite / Coral (Edge TPU)
diff --git a/docs/guide/integrations.rst b/docs/guide/integrations.rst
@@ -9,7 +9,7 @@ Weights & Biases
 
 Weights & Biases provides a callback for experiment tracking that allows to visualize and share results.
 
-The full documentation is available here: https://docs.wandb.ai/guides/integrations/other/stable-baselines-3
+The full documentation is available here: https://docs.wandb.ai/models/integrations/stable-baselines-3
 
 .. code-block:: python
 
diff --git a/docs/misc/changelog.rst b/docs/misc/changelog.rst
@@ -19,6 +19,7 @@ Bug Fixes:
 - Fixed env checker to properly handle ``Sequence`` observation spaces when nested inside composite spaces (``Dict``, ``Tuple``, ``OneOf``) (@copilot)
 - Update env checker to warn users when using Graph space (@dhruvmalik007).
 - Fixed memory leak in ``VecVideoRecorder`` where ``recorded_frames`` stayed in memory due to reference in the moviepy clip (@copilot)
+- Remove double space in `StopTrainingOnRewardThreshold` callback message (@sea-bass)
 
 `SB3-Contrib`_
 ^^^^^^^^^^^^^^
@@ -42,6 +43,11 @@ Documentation:
 - Documented Atari wrapper reset behavior where ``env.reset()`` may perform a no-op step instead of truly resetting when ``terminal_on_life_loss=True`` (default), and how to avoid this behavior by setting ``terminal_on_life_loss=False``
 - Clarified comment in ``_sample_action()`` method to better explain action scaling behavior for off-policy algorithms (@copilot)
 - Added sb3-plus to projects page
+- Added example usage of ONNX JS
+- Updated link to paper of community project DeepNetSlice (@AlexPasqua)
+- Added example usage of Tensorflow JS
+- Included exact versions in ONNX JS and example project
+- Made step 2 (`pip install`) of `CONTRIBUTING.md` more robust 
 
 
 Release 2.7.0 (2025-07-25)
@@ -1898,4 +1904,4 @@ And all the contributors:
 @DavyMorgan @luizapozzobon @Bonifatius94 @theSquaredError @harveybellini @DavyMorgan @FieteO @jonasreiher @npit @WeberSamuel @troiganto
 @lutogniew @lbergmann1 @lukashass @BertrandDecoster @pseudo-rnd-thoughts @stefanbschneider @kyle-he @PatrickHelm @corentinlger
 @marekm4 @stagoverflow @rushitnshah @markscsmith @NickLucche @cschindlbeck @peteole @jak3122 @will-maclean
-@brn-dev @jmacglashan @kplers @MarcDcls @chrisgao99 @pstahlhofen @akanto @Trenza1ore
+@brn-dev @jmacglashan @kplers @MarcDcls @chrisgao99 @pstahlhofen @akanto @Trenza1ore @JonathanColetti
diff --git a/docs/misc/projects.rst b/docs/misc/projects.rst
@@ -228,7 +228,8 @@ intelligent agents to perform network slice placement.
 
 | Author: Alex Pasquali
 | Github: https://github.com/AlexPasqua/DeepNetSlice
-| Paper: **under review** (citation instructions on the project's README.md) -> see this Master's Thesis for the moment: https://etd.adm.unipi.it/theses/available/etd-01182023-110038/unrestricted/Tesi_magistrale_Pasquali_Alex.pdf
+| Paper: https://ieeexplore.ieee.org/document/10625023
+| Associated Master's Thesis: https://etd.adm.unipi.it/theses/available/etd-01182023-110038/unrestricted/Tesi_magistrale_Pasquali_Alex.pdf
 
 
 PokemonRedExperiments
diff --git a/stable_baselines3/common/callbacks.py b/stable_baselines3/common/callbacks.py
@@ -565,7 +565,7 @@ def _on_step(self) -> bool:
         if self.verbose >= 1 and not continue_training:
             print(
                 f"Stopping training because the mean reward {self.parent.best_mean_reward:.2f} "
-                f" is above the threshold {self.reward_threshold}"
+                f"is above the threshold {self.reward_threshold}"
             )
         return continue_training
 

Original file line number	Diff line number	Diff line change
`@@ -565,7 +565,7 @@ def _on_step(self) -> bool:`
`565`	`565`	`if self.verbose >= 1 and not continue_training:`
`566`	`566`	`print(`
`567`	`567`	`f"Stopping training because the mean reward {self.parent.best_mean_reward:.2f} "`
`568`		`- f" is above the threshold {self.reward_threshold}"`
	`568`	`+ f"is above the threshold {self.reward_threshold}"`
`569`	`569`	`)`
`570`	`570`	`return continue_training`
`571`	`571`