[release] v0.3.2 (#502)

hiyouga · web-flow · commit a23fb5b06be6 · 2025-09-18T17:23:16.000+08:00
diff --git a/README.md b/README.md
@@ -56,6 +56,8 @@ apptainer pull easyr1.sif docker://hiyouga/verl:ngc-th2.7.1-cu12.6-vllm0.10.0
 apptainer shell --nv --cleanenv --bind /mnt/your_dir:/mnt/your_dir easyr1.sif
 ```
 
+Use `USE_MODELSCOPE_HUB=1` to download models from the ModelScope hub.
+
 ### Hardware Requirements
 
 \* *estimated*
diff --git a/assets/baselines.md b/assets/baselines.md
@@ -1,29 +1,28 @@
 # Baselines
 
-Environment: [hiyouga/verl:ngc-th2.6.0-cu126-vllm0.8.3-flashinfer0.2.2-cxx11abi0](https://hub.docker.com/layers/hiyouga/verl/ngc-th2.6.0-cu126-vllm0.8.3-flashinfer0.2.2-cxx11abi0/images/sha256-335ed6cd1fe73090e458409cfa4394d6abf4cd0503ca44dbafdc28ff72e5ed20)
+Environment: [hiyouga/verl:ngc-th2.7.1-cu12.6-vllm0.10.0](https://hub.docker.com/layers/hiyouga/verl/ngc-th2.7.1-cu12.6-vllm0.10.0/images/sha256-cfc8c1ce3ea52dee0444f3e58e900d0b1d3b6b315deaf5f58c44b5fbb52fa989)
 
-EasyR1 version: [v0.3.0](https://github.com/hiyouga/EasyR1/tree/v0.3.0)
+EasyR1 version: [v0.3.2](https://github.com/hiyouga/EasyR1/tree/v0.3.2)
 
 Welcome to contribute new data points!
 
 ## Algorithm Baselines
 
 ### [Qwen2.5-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) on [Math12k](https://huggingface.co/datasets/hiyouga/math12k)
 
-| Size | Algorithm   | Bits | LR   | KL   | Test Score |
-| ---- | ----------- | ---- | ---- | ---- | ---------- |
-| 7B   | GRPO        | AMP  | 1e-6 | 1e-2 | 0.73->0.79 |
+| Size | Algorithm   | Bits | LR   | KL   | Test Accuracy        |
+| ---- | ----------- | ---- | ---- | ---- | -------------------- |
+| 7B   | GRPO        | AMP  | 1e-6 | 1e-2 | 0.75 -> 0.77 (+0.02) |
 
 ### [Qwen2.5-VL-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) on [Geometry3k](https://huggingface.co/datasets/hiyouga/geometry3k)
 
-| Size | Algorithm   | Bits | LR   | KL   | Test Score |
-| ---- | ----------- | ---- | ---- | ---- | ---------- |
-| 7B   | GRPO        | AMP  | 1e-6 | 1e-2 | 0.39->0.52 |
-| 7B   | GRPO        | BF16 | 1e-6 | 1e-2 | 0.39->0.52 |
-| 7B   | GRPO        | AMP  | 1e-6 | 1e-3 | 0.39->0.52 |
-| 7B   | RLOO        | AMP  | 1e-6 | 1e-2 | 0.39->0.53 |
-| 3B   | GRPO        | AMP  | 1e-6 | 1e-2 | 0.27->0.44 |
-| 32B  | GRPO        | BF16 | 1e-6 | 1e-2 | 0.46->0.61 |
+| Size | Algorithm   | Bits | LR   | KL   | Test Accuracy        |
+| ---- | ----------- | ---- | ---- | ---- | -------------------- |
+| 7B   | GRPO        | AMP  | 1e-6 | 1e-2 | 0.37 -> 0.48 (+0.11) |
+| 7B   | GRPO        | BF16 | 1e-6 | 1e-2 | 0.37 -> 0.48 (+0.11) |
+| 7B   | DAPO        | AMP  | 1e-6 | 1e-2 | 0.37 -> 0.50 (+0.13) |
+| 3B   | GRPO        | AMP  | 1e-6 | 1e-2 | 0.24 -> 0.38 (+0.14) |
+| 32B  | GRPO        | BF16 | 1e-6 | 1e-2 | 0.50 -> 0.56 (+0.06) |
 
 > [!NOTE]
 > The hyper-parameters not listed are all the same as the default values.
@@ -32,21 +31,20 @@ Welcome to contribute new data points!
 
 ### [Qwen2.5-VL-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) on [Geometry3k](https://huggingface.co/datasets/hiyouga/geometry3k)
 
-| Size | GPU Type      | Bits | Batch Size | vLLM Util | vLLM TP | Peak Mem | Peak VRAM | Throughput | Sec per step | Actor MFU |
-| ---- | ------------- | ---- | ---------- | --------- | ------- | -------- | --------- | ---------- | ------------ | --------- |
-| 3B   | 8 * H100 80GB | AMP  | 4 / 16     | 0.6       | 2       | 120GB    | 35GB      | 1200       | 180s         | 6.3%      |
-| 7B   | 8 * H100 80GB | AMP  | 4 / 16     | 0.6       | 2       | 140GB    | 60GB      | 1200       | 180s         | 13.6%     |
-| 7B   | 8 * H100 80GB | AMP  | 10 / 20    | 0.6       | 2       | 150GB    | 75GB      | 1400       | 170s         | 19.2%     |
-| 7B   | 8 * L20 48GB  | AMP  | 4 / 16     | 0.6       | 2       | 150GB    | 44GB      | 410        | 580s         | 26.5%     |
-| 7B   | 8 * H100 80GB | BF16 | 4 / 16     | 0.6       | 2       | 150GB    | 50GB      | 1280       | 190s         | 13.9%     |
-| 32B  | 8 * H100 80GB | BF16 | 1 / 8      | 0.6       | 8       | 240GB    | 68GB      | 360        | 860s         | 11.2%     |
+| Size | GPU Type      | Bits | Batch Size | vLLM TP | Peak Mem | Peak VRAM | Throughput  | Sec per step | Actor MFU |
+| ---- | ------------- | ---- | ---------- | ------- | -------- | --------- | ----------- | ------------ | --------- |
+| 3B   | 8 * H100 80GB | AMP  | 1 / 2      | 2       | 120GB    | 54GB      | 1800 (+600) | 120s         | 8.1%      |
+| 7B   | 8 * H100 80GB | AMP  | 1 / 2      | 2       | 120GB    | 68GB      | 1600 (+400) | 145s         | 16.0%     |
+| 7B   | 8 * H100 80GB | AMP  | 4 / 8      | 2       | 200GB    | 72GB      | 2000 (+600) | 120s         | 23.2%     |
+| 7B   | 8 * L20 48GB  | AMP  | 1 / 2      | 2       | 120GB    | 42GB      | 410  (+0)   | 580s         | 26.5%     |
+| 7B   | 8 * H100 80GB | BF16 | 1 / 2      | 2       | 120GB    | 58GB      | 1600 (+320) | 145s         | 16.0%     |
+| 32B  | 8 * H100 80GB | BF16 | 1 / 2      | 8       | 260GB    | 72GB      | 620  (+260) | 530s         | 25.8%     |
 
 - Batch Size: micro_batch_size_per_device_for_update / micro_batch_size_per_device_for_experience
-- vLLM Util: rollout.gpu_memory_utilization
 - vLLM TP: rollout.tensor_parallel_size
 - Peak Mem: Peak CPU memory usage
 - Peak VRAM: Peak GPU memory usage
-- Throughput: Number of tokens per second per GPU by one training step
+- Throughput: Number of tokens per second per GPU by one training step (including the improvement compared to the previous version)
 - Sec per step: Average time per step in seconds
 
 > [!NOTE]
diff --git a/verl/__init__.py b/verl/__init__.py
@@ -21,7 +21,7 @@
     from modelscope.utils.hf_util import patch_hub  # type: ignore
 
 
-__version__ = "0.3.2.dev0"
+__version__ = "0.3.2"
 
 
 if os.getenv("USE_MODELSCOPE_HUB", "0").lower() in ["true", "y", "1"]: