8卡h200，deepspeed-zero2，qwen32b全量微调出现oom现象 #8643

fangjin001024 · 2025-07-15T02:26:07Z

fangjin001024
Jul 15, 2025

System Info
8卡h200，deepspeed-zero2，qwen32b全量微调出现oom现象
bf16 混合精度
get_dataset()3.4G --> load_model（from_pretrained（67g）-->init_adapter(67g)）-->梯度累计**（100G）**>trainer.train(OOM)
模型加载完后，显存占用正常67G，开启了梯度累计，在计算并保存每个batch的梯度时候，占用了32G多，有点异常，按正常估算，梯度每张卡应该是32*4/8=16G，后续模型更新参数直接就爆显存了。
执行命令脚本
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 llamafactory-cli train examples/train_full/llama3_full_sft.yaml
参数设置：
`### model
model_name_or_path: /opt/workspace/model/Qwen/Qwen3-32B
trust_remote_code: true

method
stage: sft
do_train: true
finetuning_type: full
deepspeed: examples/deepspeed/ds_z2_config.json # choices: [ds_z0_config.json, ds_z2_config.json, ds_z3_config.json]

dataset
dataset: alpaca_zh_demo
template: qwen3
cutoff_len: 1024
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16
dataloader_num_workers: 4

output
output_dir: saves/qwen3-32b/full/sft
logging_steps: 2
save_steps: 500
plot_loss: true
overwrite_output_dir: true
save_only_model: false
report_to: none # choices: [none, wandb, tensorboard, swanlab, mlflow]

train
per_device_train_batch_size: 1
gradient_accumulation_steps: 1
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: false
learning_rate: 1.0e-5
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
pure_bf16: false
ddp_timeout: 180000000
resume_from_checkpoint: null
deepspeed 参数 {
"train_batch_size": "auto",
"train_micro_batch_size_per_gpu": "auto",
"gradient_accumulation_steps": "auto",
"gradient_clipping": "auto",
"zero_allow_untested_optimizer": true,
"fp16": {
"enabled": false,
"loss_scale": 0,
"loss_scale_window": 1000,
"initial_scale_power": 16,
"hysteresis": 2,
"min_loss_scale": 1
},
"bf16": {
"enabled": true
},
"zero_optimization": {
"stage": 2,
"allgather_partitions": true,
"allgather_bucket_size": 5e8,
"overlap_comm": false,
"reduce_scatter": true,
"reduce_bucket_size": 5e8,
"contiguous_gradients": true,
"round_robin_gradients": true,
"load_from_fp32_weights": false
},
"flops_profiler": {
"enabled": true,
"profile_step": 6,
"module_depth": -1,
"top_modules": 1,
"detailed": true,
"output_file": "saves/qwen3-8b/full/sft/flops_report.txt"
}
}
`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

8卡h200，deepspeed-zero2，qwen32b全量微调出现oom现象 #8643

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

8卡h200，deepspeed-zero2，qwen32b全量微调出现oom现象 #8643

Uh oh!

fangjin001024 Jul 15, 2025

Replies: 0 comments

fangjin001024
Jul 15, 2025