11# Baselines
22
3- Environment: [ hiyouga/verl: ngc-th2 .6.0-cu126 -vllm0.8.3-flashinfer0.2.2-cxx11abi0 ] ( https://hub.docker.com/layers/hiyouga/verl/ngc-th2.6.0-cu126 -vllm0.8.3-flashinfer0.2.2-cxx11abi0 /images/sha256-335ed6cd1fe73090e458409cfa4394d6abf4cd0503ca44dbafdc28ff72e5ed20 )
3+ Environment: [ hiyouga/verl: ngc-th2 .7.1-cu12.6 -vllm0.10.0 ] ( https://hub.docker.com/layers/hiyouga/verl/ngc-th2.7.1-cu12.6 -vllm0.10.0 /images/sha256-cfc8c1ce3ea52dee0444f3e58e900d0b1d3b6b315deaf5f58c44b5fbb52fa989 )
44
5- EasyR1 version: [ v0.3.0 ] ( https://github.com/hiyouga/EasyR1/tree/v0.3.0 )
5+ EasyR1 version: [ v0.3.2 ] ( https://github.com/hiyouga/EasyR1/tree/v0.3.2 )
66
77Welcome to contribute new data points!
88
99## Algorithm Baselines
1010
1111### [ Qwen2.5-Instruct] ( https://huggingface.co/Qwen/Qwen2.5-7B-Instruct ) on [ Math12k] ( https://huggingface.co/datasets/hiyouga/math12k )
1212
13- | Size | Algorithm | Bits | LR | KL | Test Score |
14- | ---- | ----------- | ---- | ---- | ---- | ---------- |
15- | 7B | GRPO | AMP | 1e-6 | 1e-2 | 0.73->0.79 |
13+ | Size | Algorithm | Bits | LR | KL | Test Accuracy |
14+ | ---- | ----------- | ---- | ---- | ---- | -------------------- |
15+ | 7B | GRPO | AMP | 1e-6 | 1e-2 | 0.75 -> 0.77 (+0.02) |
1616
1717### [ Qwen2.5-VL-Instruct] ( https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct ) on [ Geometry3k] ( https://huggingface.co/datasets/hiyouga/geometry3k )
1818
19- | Size | Algorithm | Bits | LR | KL | Test Score |
20- | ---- | ----------- | ---- | ---- | ---- | ---------- |
21- | 7B | GRPO | AMP | 1e-6 | 1e-2 | 0.39->0.52 |
22- | 7B | GRPO | BF16 | 1e-6 | 1e-2 | 0.39->0.52 |
23- | 7B | GRPO | AMP | 1e-6 | 1e-3 | 0.39->0.52 |
24- | 7B | RLOO | AMP | 1e-6 | 1e-2 | 0.39->0.53 |
25- | 3B | GRPO | AMP | 1e-6 | 1e-2 | 0.27->0.44 |
26- | 32B | GRPO | BF16 | 1e-6 | 1e-2 | 0.46->0.61 |
19+ | Size | Algorithm | Bits | LR | KL | Test Accuracy |
20+ | ---- | ----------- | ---- | ---- | ---- | -------------------- |
21+ | 7B | GRPO | AMP | 1e-6 | 1e-2 | 0.37 -> 0.48 (+0.11) |
22+ | 7B | GRPO | BF16 | 1e-6 | 1e-2 | 0.37 -> 0.48 (+0.11) |
23+ | 7B | DAPO | AMP | 1e-6 | 1e-2 | 0.37 -> 0.50 (+0.13) |
24+ | 3B | GRPO | AMP | 1e-6 | 1e-2 | 0.24 -> 0.38 (+0.14) |
25+ | 32B | GRPO | BF16 | 1e-6 | 1e-2 | 0.50 -> 0.56 (+0.06) |
2726
2827> [ !NOTE]
2928> The hyper-parameters not listed are all the same as the default values.
@@ -32,21 +31,20 @@ Welcome to contribute new data points!
3231
3332### [ Qwen2.5-VL-Instruct] ( https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct ) on [ Geometry3k] ( https://huggingface.co/datasets/hiyouga/geometry3k )
3433
35- | Size | GPU Type | Bits | Batch Size | vLLM Util | vLLM TP | Peak Mem | Peak VRAM | Throughput | Sec per step | Actor MFU |
36- | ---- | ------------- | ---- | ---------- | --------- | ------- | -------- | --------- | ---------- | ------------ | --------- |
37- | 3B | 8 * H100 80GB | AMP | 4 / 16 | 0.6 | 2 | 120GB | 35GB | 1200 | 180s | 6.3 % |
38- | 7B | 8 * H100 80GB | AMP | 4 / 16 | 0.6 | 2 | 140GB | 60GB | 1200 | 180s | 13.6 % |
39- | 7B | 8 * H100 80GB | AMP | 10 / 20 | 0.6 | 2 | 150GB | 75GB | 1400 | 170s | 19 .2% |
40- | 7B | 8 * L20 48GB | AMP | 4 / 16 | 0.6 | 2 | 150GB | 44GB | 410 | 580s | 26.5% |
41- | 7B | 8 * H100 80GB | BF16 | 4 / 16 | 0.6 | 2 | 150GB | 50GB | 1280 | 190s | 13.9 % |
42- | 32B | 8 * H100 80GB | BF16 | 1 / 8 | 0.6 | 8 | 240GB | 68GB | 360 | 860s | 11.2 % |
34+ | Size | GPU Type | Bits | Batch Size | vLLM TP | Peak Mem | Peak VRAM | Throughput | Sec per step | Actor MFU |
35+ | ---- | ------------- | ---- | ---------- | ------- | -------- | --------- | - ---------- | ------------ | --------- |
36+ | 3B | 8 * H100 80GB | AMP | 1 / 2 | 2 | 120GB | 54GB | 1800 (+600) | 120s | 8.1 % |
37+ | 7B | 8 * H100 80GB | AMP | 1 / 2 | 2 | 120GB | 68GB | 1600 (+400) | 145s | 16.0 % |
38+ | 7B | 8 * H100 80GB | AMP | 4 / 8 | 2 | 200GB | 72GB | 2000 (+600) | 120s | 23 .2% |
39+ | 7B | 8 * L20 48GB | AMP | 1 / 2 | 2 | 120GB | 42GB | 410 (+0) | 580s | 26.5% |
40+ | 7B | 8 * H100 80GB | BF16 | 1 / 2 | 2 | 120GB | 58GB | 1600 (+320) | 145s | 16.0 % |
41+ | 32B | 8 * H100 80GB | BF16 | 1 / 2 | 8 | 260GB | 72GB | 620 (+260) | 530s | 25.8 % |
4342
4443- Batch Size: micro_batch_size_per_device_for_update / micro_batch_size_per_device_for_experience
45- - vLLM Util: rollout.gpu_memory_utilization
4644- vLLM TP: rollout.tensor_parallel_size
4745- Peak Mem: Peak CPU memory usage
4846- Peak VRAM: Peak GPU memory usage
49- - Throughput: Number of tokens per second per GPU by one training step
47+ - Throughput: Number of tokens per second per GPU by one training step (including the improvement compared to the previous version)
5048- Sec per step: Average time per step in seconds
5149
5250> [ !NOTE]
0 commit comments