|
| 1 | +.. _crossq: |
| 2 | + |
| 3 | +.. automodule:: sb3_contrib.crossq |
| 4 | + |
| 5 | + |
| 6 | +CrossQ |
| 7 | +====== |
| 8 | + |
| 9 | +Implementation of CrossQ proposed in: |
| 10 | + |
| 11 | +`Bhatt A.* & Palenicek D.* et al. Batch Normalization in Deep Reinforcement Learning for Greater Sample Efficiency and Simplicity. ICLR 2024.` |
| 12 | + |
| 13 | +CrossQ is an algorithm that uses batch normalization to improve the sample efficiency of off-policy deep reinforcement learning algorithms. |
| 14 | +It is based on the idea of carefully introducing batch normalization layers in the critic network and dropping target networks. |
| 15 | +This results in a simpler and more sample-efficient algorithm without requiring high update-to-data ratios. |
| 16 | + |
| 17 | +.. rubric:: Available Policies |
| 18 | + |
| 19 | +.. autosummary:: |
| 20 | + :nosignatures: |
| 21 | + |
| 22 | + MlpPolicy |
| 23 | + |
| 24 | +.. note:: |
| 25 | + |
| 26 | + Compared to the original implementation, the default network architecture for the q-value function is ``[1024, 1024]`` |
| 27 | + instead of ``[2048, 2048]`` as it provides a good compromise between speed and performance. |
| 28 | + |
| 29 | +.. note:: |
| 30 | + |
| 31 | + There is currently no ``CnnPolicy`` for using CrossQ with images. We welcome help from contributors to add this feature. |
| 32 | + |
| 33 | + |
| 34 | +Notes |
| 35 | +----- |
| 36 | + |
| 37 | +- Original paper: https://openreview.net/pdf?id=PczQtTsTIX |
| 38 | +- Original Implementation: https://github.com/adityab/CrossQ |
| 39 | +- SBX (SB3 Jax) Implementation: https://github.com/araffin/sbx |
| 40 | + |
| 41 | + |
| 42 | +Can I use? |
| 43 | +---------- |
| 44 | + |
| 45 | +- Recurrent policies: ❌ |
| 46 | +- Multi processing: ✔️ |
| 47 | +- Gym spaces: |
| 48 | + |
| 49 | + |
| 50 | +============= ====== =========== |
| 51 | +Space Action Observation |
| 52 | +============= ====== =========== |
| 53 | +Discrete ❌ ✔️ |
| 54 | +Box ✔️ ✔️ |
| 55 | +MultiDiscrete ❌ ✔️ |
| 56 | +MultiBinary ❌ ✔️ |
| 57 | +Dict ❌ ❌ |
| 58 | +============= ====== =========== |
| 59 | + |
| 60 | + |
| 61 | +Example |
| 62 | +------- |
| 63 | + |
| 64 | +.. code-block:: python |
| 65 | +
|
| 66 | + from sb3_contrib import CrossQ |
| 67 | +
|
| 68 | + model = CrossQ("MlpPolicy", "Walker2d-v4") |
| 69 | + model.learn(total_timesteps=1_000_000) |
| 70 | + model.save("crossq_walker") |
| 71 | +
|
| 72 | +
|
| 73 | +Results |
| 74 | +------- |
| 75 | + |
| 76 | +Performance evaluation of CrossQ on six MuJoCo environments, see `PR #243 <https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/pull/243>`_. |
| 77 | +Compared to results from the original paper as well as a version from `SBX <https://github.com/araffin/sbx>`_. |
| 78 | + |
| 79 | +.. image:: ../images/crossQ_performance.png |
| 80 | + |
| 81 | + |
| 82 | +Open RL benchmark report: https://wandb.ai/openrlbenchmark/sb3-contrib/reports/SB3-Contrib-CrossQ--Vmlldzo4NTE2MTEx |
| 83 | + |
| 84 | + |
| 85 | +How to replicate the results? |
| 86 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 87 | + |
| 88 | +Clone RL-Zoo: |
| 89 | + |
| 90 | +.. code-block:: bash |
| 91 | +
|
| 92 | + git clone https://github.com/DLR-RM/rl-baselines3-zoo |
| 93 | + cd rl-baselines3-zoo/ |
| 94 | +
|
| 95 | +Run the benchmark (replace ``$ENV_ID`` by the envs mentioned above): |
| 96 | + |
| 97 | +.. code-block:: bash |
| 98 | +
|
| 99 | + python train.py --algo crossq --env $ENV_ID --n-eval-envs 5 --eval-episodes 20 --eval-freq 25000 |
| 100 | +
|
| 101 | +
|
| 102 | +Plot the results: |
| 103 | + |
| 104 | +.. code-block:: bash |
| 105 | +
|
| 106 | + python scripts/all_plots.py -a crossq -e HalfCheetah Ant Hopper Walker2D -f logs/ -o logs/crossq_results |
| 107 | + python scripts/plot_from_file.py -i logs/crossq_results.pkl -latex -l CrossQ |
| 108 | +
|
| 109 | +
|
| 110 | +Comments |
| 111 | +-------- |
| 112 | + |
| 113 | +This implementation is based on SB3 SAC implementation. |
| 114 | + |
| 115 | + |
| 116 | +Parameters |
| 117 | +---------- |
| 118 | + |
| 119 | +.. autoclass:: CrossQ |
| 120 | + :members: |
| 121 | + :inherited-members: |
| 122 | + |
| 123 | +.. _crossq_policies: |
| 124 | + |
| 125 | +CrossQ Policies |
| 126 | +--------------- |
| 127 | + |
| 128 | +.. autoclass:: MlpPolicy |
| 129 | + :members: |
| 130 | + :inherited-members: |
| 131 | + |
| 132 | +.. autoclass:: sb3_contrib.crossq.policies.CrossQPolicy |
| 133 | + :members: |
| 134 | + :noindex: |
0 commit comments