-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
System Info
Problem:
- Missing the CKPTs of 160-163 of the DeepSeek V3.1 HF weights during RL with mbridge.
- Cannot save the optimizer part.
4.0K -rw-r--r-- 1 root root 3.8K Nov 21 02:04 chat_template.jinja
4.0K -rw-r--r-- 1 root root 1.7K Nov 21 02:04 config.json
4.0K -rw-r--r-- 1 root root 171 Nov 21 02:04 generation_config.json
8.9G -rw-r--r-- 1 root root 8.9G Nov 21 01:48 model-00001-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:47 model-00002-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:48 model-00003-of-000163.safetensors
8.3G -rw-r--r-- 1 root root 8.3G Nov 21 01:49 model-00004-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:48 model-00005-of-000163.safetensors
8.4G -rw-r--r-- 1 root root 8.4G Nov 21 01:48 model-00006-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:49 model-00007-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:48 model-00008-of-000163.safetensors
8.3G -rw-r--r-- 1 root root 8.3G Nov 21 01:49 model-00009-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:48 model-00010-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:48 model-00011-of-000163.safetensors
2.5G -rw-r--r-- 1 root root 2.5G Nov 21 01:48 model-00012-of-000163.safetensors
8.3G -rw-r--r-- 1 root root 8.3G Nov 21 01:49 model-00013-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:48 model-00014-of-000163.safetensors
8.3G -rw-r--r-- 1 root root 8.3G Nov 21 01:50 model-00015-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:49 model-00016-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:49 model-00017-of-000163.safetensors
8.3G -rw-r--r-- 1 root root 8.3G Nov 21 01:50 model-00018-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:49 model-00019-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:49 model-00020-of-000163.safetensors
8.3G -rw-r--r-- 1 root root 8.3G Nov 21 01:50 model-00021-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:49 model-00022-of-000163.safetensors
8.3G -rw-r--r-- 1 root root 8.3G Nov 21 01:50 model-00023-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:49 model-00024-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:50 model-00025-of-000163.safetensors
8.3G -rw-r--r-- 1 root root 8.3G Nov 21 01:51 model-00026-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:50 model-00027-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:50 model-00028-of-000163.safetensors
8.3G -rw-r--r-- 1 root root 8.3G Nov 21 01:51 model-00029-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:50 model-00030-of-000163.safetensors
8.3G -rw-r--r-- 1 root root 8.3G Nov 21 01:51 model-00031-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:50 model-00032-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:51 model-00033-of-000163.safetensors
3.3G -rw-r--r-- 1 root root 3.3G Nov 21 01:51 model-00034-of-000163.safetensors
8.3G -rw-r--r-- 1 root root 8.3G Nov 21 01:51 model-00035-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:51 model-00036-of-000163.safetensors
8.3G -rw-r--r-- 1 root root 8.3G Nov 21 01:52 model-00037-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:51 model-00038-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:51 model-00039-of-000163.safetensors
8.3G -rw-r--r-- 1 root root 8.3G Nov 21 01:52 model-00040-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:51 model-00041-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:51 model-00042-of-000163.safetensors
8.3G -rw-r--r-- 1 root root 8.3G Nov 21 01:52 model-00043-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:52 model-00044-of-000163.safetensors
8.3G -rw-r--r-- 1 root root 8.3G Nov 21 01:52 model-00045-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:52 model-00046-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:52 model-00047-of-000163.safetensors
8.3G -rw-r--r-- 1 root root 8.3G Nov 21 01:53 model-00048-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:52 model-00049-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:52 model-00050-of-000163.safetensors
8.3G -rw-r--r-- 1 root root 8.3G Nov 21 01:53 model-00051-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:53 model-00052-of-000163.safetensors
8.3G -rw-r--r-- 1 root root 8.3G Nov 21 01:53 model-00053-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:53 model-00054-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:53 model-00055-of-000163.safetensors
3.3G -rw-r--r-- 1 root root 3.3G Nov 21 01:53 model-00056-of-000163.safetensors
8.3G -rw-r--r-- 1 root root 8.3G Nov 21 01:53 model-00057-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:53 model-00058-of-000163.safetensors
8.3G -rw-r--r-- 1 root root 8.3G Nov 21 01:54 model-00059-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:53 model-00060-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:53 model-00061-of-000163.safetensors
8.3G -rw-r--r-- 1 root root 8.3G Nov 21 01:54 model-00062-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:54 model-00063-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:54 model-00064-of-000163.safetensors
8.3G -rw-r--r-- 1 root root 8.3G Nov 21 01:54 model-00065-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:54 model-00066-of-000163.safetensors
8.3G -rw-r--r-- 1 root root 8.3G Nov 21 01:54 model-00067-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:54 model-00068-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:54 model-00069-of-000163.safetensors
8.3G -rw-r--r-- 1 root root 8.3G Nov 21 01:55 model-00070-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:54 model-00071-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:55 model-00072-of-000163.safetensors
8.3G -rw-r--r-- 1 root root 8.3G Nov 21 01:55 model-00073-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:55 model-00074-of-000163.safetensors
8.3G -rw-r--r-- 1 root root 8.3G Nov 21 01:55 model-00075-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:55 model-00076-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:55 model-00077-of-000163.safetensors
3.3G -rw-r--r-- 1 root root 3.3G Nov 21 01:55 model-00078-of-000163.safetensors
8.3G -rw-r--r-- 1 root root 8.3G Nov 21 01:55 model-00079-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:55 model-00080-of-000163.safetensors
8.3G -rw-r--r-- 1 root root 8.3G Nov 21 01:56 model-00081-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:56 model-00082-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:56 model-00083-of-000163.safetensors
8.3G -rw-r--r-- 1 root root 8.3G Nov 21 01:56 model-00084-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:56 model-00085-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:56 model-00086-of-000163.safetensors
8.3G -rw-r--r-- 1 root root 8.3G Nov 21 01:57 model-00087-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:56 model-00088-of-000163.safetensors
8.3G -rw-r--r-- 1 root root 8.3G Nov 21 01:57 model-00089-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:56 model-00090-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:56 model-00091-of-000163.safetensors
8.3G -rw-r--r-- 1 root root 8.3G Nov 21 01:58 model-00092-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:57 model-00093-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:57 model-00094-of-000163.safetensors
8.3G -rw-r--r-- 1 root root 8.3G Nov 21 01:58 model-00095-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:57 model-00096-of-000163.safetensors
8.3G -rw-r--r-- 1 root root 8.3G Nov 21 01:58 model-00097-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:57 model-00098-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:57 model-00099-of-000163.safetensors
3.3G -rw-r--r-- 1 root root 3.3G Nov 21 01:57 model-00100-of-000163.safetensors
8.3G -rw-r--r-- 1 root root 8.3G Nov 21 01:58 model-00101-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:57 model-00102-of-000163.safetensors
8.3G -rw-r--r-- 1 root root 8.3G Nov 21 01:59 model-00103-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:58 model-00104-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:58 model-00105-of-000163.safetensors
8.3G -rw-r--r-- 1 root root 8.3G Nov 21 01:59 model-00106-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:58 model-00107-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:58 model-00108-of-000163.safetensors
8.3G -rw-r--r-- 1 root root 8.3G Nov 21 01:59 model-00109-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:58 model-00110-of-000163.safetensors
8.3G -rw-r--r-- 1 root root 8.3G Nov 21 01:59 model-00111-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:58 model-00112-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:59 model-00113-of-000163.safetensors
8.3G -rw-r--r-- 1 root root 8.3G Nov 21 02:00 model-00114-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:59 model-00115-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:59 model-00116-of-000163.safetensors
8.3G -rw-r--r-- 1 root root 8.3G Nov 21 02:00 model-00117-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:59 model-00118-of-000163.safetensors
8.3G -rw-r--r-- 1 root root 8.3G Nov 21 02:00 model-00119-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:59 model-00120-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 01:59 model-00121-of-000163.safetensors
3.3G -rw-r--r-- 1 root root 3.3G Nov 21 01:59 model-00122-of-000163.safetensors
8.3G -rw-r--r-- 1 root root 8.3G Nov 21 02:00 model-00123-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 02:00 model-00124-of-000163.safetensors
8.3G -rw-r--r-- 1 root root 8.3G Nov 21 02:01 model-00125-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 02:00 model-00126-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 02:00 model-00127-of-000163.safetensors
8.3G -rw-r--r-- 1 root root 8.3G Nov 21 02:01 model-00128-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 02:00 model-00129-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 02:00 model-00130-of-000163.safetensors
8.3G -rw-r--r-- 1 root root 8.3G Nov 21 02:01 model-00131-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 02:01 model-00132-of-000163.safetensors
8.3G -rw-r--r-- 1 root root 8.3G Nov 21 02:01 model-00133-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 02:01 model-00134-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 02:01 model-00135-of-000163.safetensors
8.3G -rw-r--r-- 1 root root 8.3G Nov 21 02:02 model-00136-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 02:01 model-00137-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 02:01 model-00138-of-000163.safetensors
8.3G -rw-r--r-- 1 root root 8.3G Nov 21 02:02 model-00139-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 02:01 model-00140-of-000163.safetensors
5.9G -rw-r--r-- 1 root root 5.9G Nov 21 02:02 model-00141-of-000163.safetensors
8.3G -rw-r--r-- 1 root root 8.3G Nov 21 02:02 model-00142-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 02:02 model-00143-of-000163.safetensors
8.3G -rw-r--r-- 1 root root 8.3G Nov 21 02:02 model-00144-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 02:02 model-00145-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 02:02 model-00146-of-000163.safetensors
8.3G -rw-r--r-- 1 root root 8.3G Nov 21 02:03 model-00147-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 02:02 model-00148-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 02:03 model-00149-of-000163.safetensors
8.3G -rw-r--r-- 1 root root 8.3G Nov 21 02:03 model-00150-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 02:03 model-00151-of-000163.safetensors
8.3G -rw-r--r-- 1 root root 8.3G Nov 21 02:03 model-00152-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 02:03 model-00153-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 02:03 model-00154-of-000163.safetensors
8.3G -rw-r--r-- 1 root root 8.3G Nov 21 02:04 model-00155-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 02:03 model-00156-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 02:03 model-00157-of-000163.safetensors
8.3G -rw-r--r-- 1 root root 8.3G Nov 21 02:04 model-00158-of-000163.safetensors
8.1G -rw-r--r-- 1 root root 8.1G Nov 21 02:04 model-00159-of-000163.safetensors
3.9M -rw-r--r-- 1 root root 3.9M Nov 21 02:04 model.safetensors.index.json
4.0K -rw-r--r-- 1 root root 485 Nov 21 02:04 special_tokens_map.json
9.6M -rw-r--r-- 1 root root 9.6M Nov 21 02:04 tokenizer.json
160K -rw-r--r-- 1 root root 160K Nov 21 02:04 tokenizer_config.json
Docker info: verlai/verl:vllm011.dev7
Code version: verl v0.6.0
Hardware: 256 H20 GPUs
Start command:
#!/usr/bin/env bash
set -xeuo pipefail
## !!!!!!!important!!!!!!
# 1. set the following environment variables on all your nodes
# env_vars:
# CUDA_DEVICE_MAX_CONNECTIONS: "1"
# NCCL_NVLS_ENABLE: "0"
# VLLM_USE_V1: 1
# 2. install mbridge=0.1.13 on all your node with the following command:
# pip3 install git+https://github.com/ISEEKYAN/mbridge
# 3. remove the `quantization_config` in the DeepSeek-V3's `config.json` and
# set `num_nextn_predict_layers=0` to disable MTP, which is not currently supported
export CUDA_DEVICE_MAX_CONNECTIONS=1
export NCCL_NVLS_ENABLE=0
export VLLM_USE_V1=1
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
[ -f "${SCRIPT_DIR}/env.sh" ] && source "${SCRIPT_DIR}/env.sh"
adv_estimator=grpo
use_kl_in_reward=False
kl_coef=0.0
use_kl_loss=True
kl_loss_coef=0.001
clip_ratio_low=0.2
clip_ratio_high=0.28
max_prompt_length=$((1024 * 2))
### 16K
# max_response_length=$((2048 * 8))
### 32k
max_response_length=30000
enable_overlong_buffer=False
overlong_buffer_len=$((1024 * 4))
overlong_penalty_factor=1.0
loss_mode=gspo
loss_agg_mode="token-mean"
train_prompt_bsz=128
n_resp_per_prompt=8
train_prompt_mini_bsz=32
n_resp_per_prompt_val=1
# minimum nodes for DeepSeek-V3: 12 nodes
NNODES=32
RAY_DATA_HOME=/aaa/VERL-official/verl_v0.6.0/
MODEL_PATH=/aaa/DeepSeek-V3.1-Terminus_deepseek-ai-nomtp
dapo_math_17k=/aaa/BytedTsinghua-SIA/DAPO-Math-17k
aime_2024=/aaa/Maxwell-Jia/AIME_2024
aime_2025=/aaa/yentinglin/aime_2025
TRAIN_FILE="['$dapo_math_17k']"
TEST_FILE="['$aime_2024','$aime_2025']"
# Algorithm
temperature=1.0
top_p=1.0
top_k=-1 # 0 for HF rollout, -1 for vLLM rollout
val_top_p=0.7
# Performance Related Parameter
use_dynamic_bsz=True
actor_ppo_max_token_len=$(((max_prompt_length + max_response_length) * 10 / 10))
infer_ppo_max_token_len=$(((max_prompt_length + max_response_length) * 1))
offload=True
optim_offload=${OFFLOAD_OPTIM:-True}
gen_tp=32
train_tp=${TP:-8}
train_pp=${PP:-16}
EP=${EP:-8}
ETP=1
CP=1
optimizer_offload_fraction=${OFFLOAD_FRACTION:-0}
LAST_LAYER=${LAST_LAYER:-1}
USE_MBRIDGE=True
USE_DIST_CKPT=False
# CHECKPOINT_CONTENTS=['model','hf_model','optimizer','extra']
CHECKPOINT_CONTENTS="['model','hf_model','extra']"
SKIP_SAVE_HF_MODEL=${SKIP_SAVE_HF_MODEL:-0}
# if [ $SKIP_SAVE_HF_MODEL -eq 1 ]; then
# CHECKPOINT_CONTENTS=['model','optimizer','extra']
# fi
project_name='verl-deepseek-v3'
# exp_name="671B-${NNODES}-pp${train_pp}-tp${train_tp}-ep${EP}-actor-length${actor_ppo_max_token_len}"
exp_name="671B-${NNODES}-pp${train_pp}-tp${train_tp}-ep${EP}-actor-length32048"
CKPTS_DIR=$RAY_DATA_HOME/ckpt/${project_name}/${exp_name}
python3 -m verl.trainer.main_ppo \
--config-path=config \
--config-name='ppo_megatron_trainer.yaml' \
data.train_files="${TRAIN_FILE}" \
data.val_files="${TEST_FILE}" \
data.prompt_key=prompt \
data.truncation='left' \
data.custom_cls.path=examples/grpo_trainer/retool.py \
data.custom_cls.name=CustomRLHFDataset \
custom_reward_function.path=examples/grpo_trainer/retool.py \
custom_reward_function.name=compute_score_dapo \
data.max_prompt_length=${max_prompt_length} \
data.max_response_length=${max_response_length} \
data.train_batch_size=${train_prompt_bsz} \
actor_rollout_ref.rollout.n=${n_resp_per_prompt} \
actor_rollout_ref.rollout.name=vllm \
algorithm.adv_estimator=${adv_estimator} \
algorithm.use_kl_in_reward=${use_kl_in_reward} \
algorithm.kl_ctrl.kl_coef=${kl_coef} \
actor_rollout_ref.model.use_fused_kernels=True \
actor_rollout_ref.actor.use_kl_loss=${use_kl_loss} \
actor_rollout_ref.actor.kl_loss_coef=${kl_loss_coef} \
actor_rollout_ref.actor.clip_ratio_low=${clip_ratio_low} \
actor_rollout_ref.actor.clip_ratio_high=${clip_ratio_high} \
actor_rollout_ref.actor.clip_ratio_c=10.0 \
actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=2 \
actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=4 \
actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=4 \
actor_rollout_ref.actor.use_dynamic_bsz=${use_dynamic_bsz} \
actor_rollout_ref.ref.log_prob_use_dynamic_bsz=${use_dynamic_bsz} \
actor_rollout_ref.rollout.log_prob_use_dynamic_bsz=${use_dynamic_bsz} \
actor_rollout_ref.actor.ppo_max_token_len_per_gpu=${actor_ppo_max_token_len} \
actor_rollout_ref.actor.checkpoint.async_save=False \
actor_rollout_ref.actor.checkpoint.save_contents=$CHECKPOINT_CONTENTS \
actor_rollout_ref.ref.log_prob_max_token_len_per_gpu=${infer_ppo_max_token_len} \
actor_rollout_ref.rollout.log_prob_max_token_len_per_gpu=${infer_ppo_max_token_len} \
actor_rollout_ref.model.path="${MODEL_PATH}" \
actor_rollout_ref.actor.optim.lr=1e-6 \
actor_rollout_ref.actor.optim.lr_warmup_steps=10 \
actor_rollout_ref.actor.optim.weight_decay=0.1 \
+actor_rollout_ref.actor.optim.override_optimizer_config.optimizer_offload_fraction=${optimizer_offload_fraction} \
+actor_rollout_ref.actor.optim.override_optimizer_config.overlap_cpu_optimizer_d2h_h2d=True \
+actor_rollout_ref.actor.optim.override_optimizer_config.use_precision_aware_optimizer=True \
+actor_rollout_ref.actor.optim.override_optimizer_config.optimizer_cpu_offload=True \
actor_rollout_ref.actor.ppo_mini_batch_size=${train_prompt_mini_bsz} \
actor_rollout_ref.actor.megatron.use_dist_checkpointing=${USE_DIST_CKPT} \
actor_rollout_ref.actor.megatron.use_mbridge=${USE_MBRIDGE} \
actor_rollout_ref.ref.megatron.use_mbridge=${USE_MBRIDGE} \
reward_model.megatron.use_mbridge=${USE_MBRIDGE} \
critic.megatron.use_mbridge=${USE_MBRIDGE} \
actor_rollout_ref.actor.megatron.param_offload=${offload} \
actor_rollout_ref.actor.megatron.optimizer_offload=${optim_offload} \
actor_rollout_ref.actor.megatron.grad_offload=${offload} \
actor_rollout_ref.actor.megatron.pipeline_model_parallel_size=${train_pp} \
actor_rollout_ref.actor.megatron.tensor_model_parallel_size=${train_tp} \
actor_rollout_ref.actor.megatron.expert_model_parallel_size=$EP \
actor_rollout_ref.actor.megatron.expert_tensor_parallel_size=$ETP \
actor_rollout_ref.actor.megatron.context_parallel_size=${CP} \
actor_rollout_ref.actor.megatron.override_transformer_config.attention_backend='fused' \
actor_rollout_ref.actor.entropy_coeff=0 \
actor_rollout_ref.actor.optim.clip_grad=1.0 \
actor_rollout_ref.actor.policy_loss.loss_mode=${loss_mode} \
actor_rollout_ref.actor.loss_agg_mode=${loss_agg_mode} \
actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \
actor_rollout_ref.rollout.tensor_model_parallel_size=${gen_tp} \
actor_rollout_ref.rollout.enable_chunked_prefill=True \
actor_rollout_ref.rollout.max_num_batched_tokens=$((max_prompt_length + max_response_length)) \
actor_rollout_ref.rollout.temperature=${temperature} \
actor_rollout_ref.rollout.top_p=${top_p} \
actor_rollout_ref.rollout.top_k=${top_k} \
actor_rollout_ref.nccl_timeout=7200 \
actor_rollout_ref.rollout.val_kwargs.temperature=${temperature} \
actor_rollout_ref.rollout.val_kwargs.top_p=${val_top_p} \
actor_rollout_ref.rollout.val_kwargs.top_k=${top_k} \
actor_rollout_ref.rollout.val_kwargs.do_sample=True \
actor_rollout_ref.rollout.val_kwargs.n=$n_resp_per_prompt_val \
actor_rollout_ref.rollout.enforce_eager=True \
actor_rollout_ref.ref.megatron.use_dist_checkpointing=${USE_DIST_CKPT} \
actor_rollout_ref.ref.megatron.pipeline_model_parallel_size=${train_pp} \
actor_rollout_ref.ref.megatron.tensor_model_parallel_size=${train_tp} \
actor_rollout_ref.ref.megatron.expert_model_parallel_size=$EP \
actor_rollout_ref.ref.megatron.expert_tensor_parallel_size=$ETP \
actor_rollout_ref.ref.megatron.context_parallel_size=${CP} \
actor_rollout_ref.ref.megatron.param_offload=${offload} \
+actor_rollout_ref.actor.megatron.override_transformer_config.apply_rope_fusion=False \
+actor_rollout_ref.actor.megatron.override_transformer_config.moe_router_dtype=fp32 \
+actor_rollout_ref.actor.megatron.override_transformer_config.moe_shared_expert_overlap=False \
+actor_rollout_ref.actor.megatron.override_transformer_config.moe_enable_deepep=True \
+actor_rollout_ref.actor.megatron.override_transformer_config.moe_token_dispatcher_type=flex \
+actor_rollout_ref.actor.megatron.override_transformer_config.recompute_method=uniform \
+actor_rollout_ref.actor.megatron.override_transformer_config.recompute_granularity=full \
+actor_rollout_ref.actor.megatron.override_transformer_config.recompute_num_layers=1 \
+actor_rollout_ref.actor.megatron.override_transformer_config.gradient_accumulation_fusion=True \
+actor_rollout_ref.actor.megatron.override_transformer_config.moe_permute_fusion=True \
+actor_rollout_ref.actor.megatron.override_transformer_config.account_for_embedding_in_pipeline_split=False \
+actor_rollout_ref.actor.megatron.override_transformer_config.account_for_loss_in_pipeline_split=False \
+actor_rollout_ref.actor.megatron.override_transformer_config.num_layers_in_last_pipeline_stage=${LAST_LAYER} \
reward_model.reward_manager=dapo \
+reward_model.reward_kwargs.overlong_buffer_cfg.enable=${enable_overlong_buffer} \
+reward_model.reward_kwargs.overlong_buffer_cfg.len=${overlong_buffer_len} \
+reward_model.reward_kwargs.overlong_buffer_cfg.penalty_factor=${overlong_penalty_factor} \
+reward_model.reward_kwargs.overlong_buffer_cfg.log=False \
+reward_model.reward_kwargs.max_resp_len=${max_response_length} \
trainer.logger=['console','wandb'] \
trainer.project_name="${project_name}" \
trainer.experiment_name="${exp_name}" \
trainer.n_gpus_per_node=8 \
trainer.nnodes="${NNODES}" \
trainer.val_before_train=False \
trainer.test_freq=10 \
trainer.save_freq=10 \
trainer.total_epochs=20 \
trainer.default_local_dir="${CKPTS_DIR}" \
+trainer.rollout_data_dir=${CKPTS_DIR}/rollout \
+trainer.validation_data_dir=${CKPTS_DIR}/validation \
trainer.resume_mode=auto \
trainer.log_val_generations=10
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
- Use the official docker image and VERL code.
- Run the default single-turn Math DAPO training with 256 H20.
Expected behavior
- CANNOT SAVE OPTIMIZER
- CANNOT SAVE FULL PARTS OF THE WEIGHT (MISSING 160-163.SAFETENSORS)
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working