Add plotting documentation (#2168)

Copilot · araffin · web-flow · commit d3b7ba7841d6 · 2025-08-27T15:29:15.000+02:00
* Initial plan

* Add comprehensive plotting documentation and update changelog

Co-authored-by: araffin &lt;1973948+araffin@users.noreply.github.com&gt;

* Cleanup plotting guide

* Cleanup doc

* Reorganize plotting docs to highlight RL Zoo3 as recommended approach with CLI examples

Co-authored-by: araffin &lt;1973948+araffin@users.noreply.github.com&gt;

* Fix hallucinations

* Cleanup doc

---------

Co-authored-by: copilot-swe-agent[bot] &lt;198982749+Copilot@users.noreply.github.com&gt;
Co-authored-by: araffin &lt;1973948+araffin@users.noreply.github.com&gt;
Co-authored-by: Antonin RAFFIN &lt;antonin.raffin@ensta.org&gt;
diff --git a/docs/guide/plotting.rst b/docs/guide/plotting.rst
@@ -0,0 +1,195 @@
+.. _plotting:
+
+========
+Plotting
+========
+
+
+Stable Baselines3 provides utilities for plotting training results to monitor and visualize your agent's learning progress.
+The plotting functionality is provided by the ``results_plotter`` module, which can load monitor files created during training and generate various plots.
+
+.. note::
+
+    For plotting, we recommend using the
+    `RL Baselines3 Zoo plotting scripts <https://rl-baselines3-zoo.readthedocs.io/en/master/guide/plot.html>`_
+    which provide plotting capabilities with confidence intervals, and publication-ready visualizations.
+
+
+Recommended Approach: RL Baselines3 Zoo Plotting
+================================================
+
+To have good plotting capabilities, including:
+
+- Comparing results across different environments
+- Publication-ready plots with confidence intervals
+- Evaluation plots with error bars
+
+We recommend using the plotting scripts from `RL Baselines3 Zoo <https://github.com/DLR-RM/rl-baselines3-zoo>`_:
+
+- `plot_train.py <https://github.com/DLR-RM/rl-baselines3-zoo/blob/master/rl_zoo3/plots/plot_train.py>`_: For training plots
+- `all_plots.py <https://github.com/DLR-RM/rl-baselines3-zoo/blob/master/rl_zoo3/plots/all_plots.py>`_: For evaluation plots, to post-process the result
+- `plot_from_file.py <https://github.com/DLR-RM/rl-baselines3-zoo/blob/master/rl_zoo3/plots/plot_from_file.py>`_: For more advanced plotting from post-processed results
+
+These scripts provide additional features not available in the basic SB3 plotting utilities.
+
+
+Installation
+------------
+
+First, install RL Baselines3 Zoo:
+
+.. code-block:: bash
+
+    pip install rl_zoo3[plots]
+
+Basic Training Plot Examples
+----------------------------
+
+.. code-block:: bash
+
+    # Train an agent
+    python -m rl_zoo3.train --algo ppo --env CartPole-v1 -f logs/
+
+    # Plot training results for a single algorithm
+    python -m rl_zoo3.plots.plot_train --algo ppo --env CartPole-v1 --exp-folder logs/
+
+
+Evaluation and Comparison Plots
+-------------------------------
+
+.. code-block:: bash
+
+    # Generate evaluation plots and save post-processed results
+    # in `logs/demo_plots.pkl` in order to use `plot_from_file`
+    python -m rl_zoo3.plots.all_plots --algo ppo sac -e Pendulum-v1 -f logs/ -o logs/demo_plots
+
+    # More advanced plotting from post-processed results (with confidence intervals)
+    python -m rl_zoo3.plots.plot_from_file -i logs/demo_plots.pkl  --rliable --ci-size 0.95
+
+
+For more examples, please read the
+`RL Baselines3 Zoo plotting guide <https://rl-baselines3-zoo.readthedocs.io/en/master/guide/plot.html>`_.
+
+
+Real-Time Monitoring
+====================
+
+For real-time monitoring during training, consider using the plotting functions within callbacks
+(see the `Callbacks guide <callbacks.html>`_) or integrating with tools like `Tensorboard <tensorboard.html>`_ or  Weights & Biases
+(see the `Integrations guide <integrations.html>`_).
+
+Monitor File Format
+===================
+
+The ``Monitor`` wrapper saves training data in CSV format with the following columns:
+
+- ``r``: Episode reward
+- ``l``: Episode length (number of steps)
+- ``t``: Timestamp (wall-clock time when episode ended)
+
+Additional columns may be present if you log custom metrics in the environment"s info dict.
+
+.. note::
+
+    The plotting functions automatically handle multiple monitor files from the same directory,
+    which occurs when using vectorized environments. The episodes are loaded and sorted by timestamp
+    to maintain proper chronological order.
+
+Basic SB3 Plotting (Simple Use Cases)
+======================================
+
+Basic Plotting: Single Training Run
+-----------------------------------
+
+The simplest way to plot training results is to use the ``plot_results`` function after training an agent.
+This function reads monitor files created by the ``Monitor`` wrapper and plots the episode rewards over time.
+
+.. code-block:: python
+
+    import os
+    import gymnasium as gym
+    import matplotlib.pyplot as plt
+
+    from stable_baselines3 import PPO
+    from stable_baselines3.common.monitor import Monitor
+    from stable_baselines3.common.results_plotter import plot_results
+    from stable_baselines3.common import results_plotter
+
+    # Create log directory
+    log_dir = "tmp/"
+    os.makedirs(log_dir, exist_ok=True)
+
+    # Create and wrap the environment with Monitor
+    env = gym.make("CartPole-v1")
+    env = Monitor(env, log_dir)
+
+    # Train the agent
+    model = PPO("MlpPolicy", env, verbose=1)
+    model.learn(total_timesteps=20_000)
+
+    # Plot the results
+    plot_results([log_dir], 20_000, results_plotter.X_TIMESTEPS, "PPO CartPole")
+    plt.show()
+
+
+Different Plotting Modes
+------------------------
+
+The plotting functions support three different x-axis modes:
+
+- ``X_TIMESTEPS``: Plot rewards vs. timesteps (default)
+- ``X_EPISODES``: Plot rewards vs. episode number
+- ``X_WALLTIME``: Plot rewards vs. wall-clock time in hours
+
+.. code-block:: python
+
+    import matplotlib.pyplot as plt
+    from stable_baselines3.common import results_plotter
+
+    # Plot by timesteps (shows sample efficiency)
+    # plot_results([log_dir], None, results_plotter.X_TIMESTEPS, "Rewards vs Timesteps")
+    # By Episodes
+    plot_results([log_dir], None, results_plotter.X_EPISODES, "Rewards vs Episodes")
+    # plot_results([log_dir], None, results_plotter.X_WALLTIME, "Rewards vs Time")
+
+    plt.tight_layout()
+    plt.show()
+
+
+Advanced Plotting with Manual Data Processing
+---------------------------------------------
+
+For more control over the plotting, you can use the underlying functions to process the data manually:
+
+.. code-block:: python
+
+    import numpy as np
+    import matplotlib.pyplot as plt
+    from stable_baselines3.common.monitor import load_results
+    from stable_baselines3.common.results_plotter import ts2xy, window_func
+
+    # Load the results
+    df = load_results(log_dir)
+
+    # Convert dataframe (x=timesteps, y=episodic return)
+    x, y = ts2xy(df, "timesteps")
+
+    # Plot raw data
+    plt.figure(figsize=(10, 6))
+    plt.subplot(2, 1, 1)
+    plt.scatter(x, y, s=2, alpha=0.6)
+    plt.xlabel("Timesteps")
+    plt.ylabel("Episode Reward")
+    plt.title("Raw Episode Rewards")
+
+    # Plot smoothed data with custom window
+    plt.subplot(2, 1, 2)
+    if len(x) >= 50:  # Only smooth if we have enough data
+        x_smooth, y_smooth = window_func(x, y, 50, np.mean)
+        plt.plot(x_smooth, y_smooth, linewidth=2)
+        plt.xlabel("Timesteps")
+        plt.ylabel("Average Episode Reward (50-episode window)"")
+        plt.title("Smoothed Episode Rewards")
+
+    plt.tight_layout()
+    plt.show()
diff --git a/docs/index.rst b/docs/index.rst
@@ -54,6 +54,7 @@ Main Features
    guide/rl_zoo
    guide/sb3_contrib
    guide/sbx
+   guide/plotting
    guide/imitation
    guide/migration
    guide/checking_nan
diff --git a/docs/misc/changelog.rst b/docs/misc/changelog.rst
@@ -33,6 +33,7 @@ Others:
 
 Documentation:
 ^^^^^^^^^^^^^^
+- Added plotting documentation and examples
 - Added documentation clarifying gSDE (Generalized State-Dependent Exploration) inference behavior for PPO, SAC, and A2C algorithms
 
 
diff --git a/stable_baselines3/common/results_plotter.py b/stable_baselines3/common/results_plotter.py
@@ -47,9 +47,10 @@ def window_func(var_1: np.ndarray, var_2: np.ndarray, window: int, func: Callabl
 def ts2xy(data_frame: pd.DataFrame, x_axis: str) -> tuple[np.ndarray, np.ndarray]:
     """
     Decompose a data frame variable to x and ys
+    (y = episodic return)
 
     :param data_frame: the input data
-    :param x_axis: the axis for the x and y output
+    :param x_axis: the x-axis for the x and y output
         (can be X_TIMESTEPS='timesteps', X_EPISODES='episodes' or X_WALLTIME='walltime_hrs')
     :return: the x and y output
     """
@@ -64,7 +65,7 @@ def ts2xy(data_frame: pd.DataFrame, x_axis: str) -> tuple[np.ndarray, np.ndarray
         x_var = data_frame.t.values / 3600.0  # type: ignore[operator, assignment]
         y_var = data_frame.r.values
     else:
-        raise NotImplementedError
+        raise NotImplementedError(f"Unsupported {x_axis=}, please use one of {POSSIBLE_X_AXES}")
     return x_var, y_var  # type: ignore[return-value]