Need Help!! qwen2.5vl7b lora sft with deepspeed zero3

### Reminder

- [x] I have read the above rules and searched the existing issues.

### System Info

[2025-04-03 06:02:20,328] [INFO] [real_accelerator.py:239:get_accelerator] Setting ds_accelerator to cuda (auto detect)
INFO 04-03 06:02:21 [__init__.py:256] Automatically detected platform cuda.

- `llamafactory` version: 0.9.3.dev0
- Platform: Linux-5.4.0-162-generic-x86_64-with-glibc2.31
- Python version: 3.12.9
- PyTorch version: 2.6.0+cu118 (GPU)
- Transformers version: 4.50.0
- Datasets version: 3.4.1
- Accelerate version: 1.5.2
- PEFT version: 0.15.0
- TRL version: 0.9.6
- GPU type: NVIDIA GeForce RTX 4090
- GPU number: 8
- GPU memory: 23.65GB
- DeepSpeed version: 0.16.5
- vLLM version: 0.8.1


### Reproduction

在使用`deepspeed=examples/deepspeed/ds_z3_config.json`用4张4090对qwen2.5vl7b进行lora微调时出现如下问题
```
Traceback (most recent call last):
  File "/home/mas-wang.zhenyu/anaconda3/envs/vllm/bin/torchrun", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/mas-wang.zhenyu/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 355, in wrapper
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/home/mas-wang.zhenyu/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/distributed/run.py", line 918, in main
    run(args)
[rank1]:   File "/home/mas-wang.zhenyu/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl                                                            
[rank1]:     return self._call_impl(*args, **kwargs)                                                                                                                                                                                        
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                                        
[rank1]:   File "/home/mas-wang.zhenyu/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1845, in _call_impl                                                                                               
[rank1]:     return inner()                                                                                                                                                                                                                 
[rank1]:            ^^^^^^^                                                                                                                                                                                                                 
[rank1]:   File "/home/mas-wang.zhenyu/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1793, in inner                                                                                                    
[rank1]:     result = forward_call(*args, **kwargs)                                                                                                                                                                                         
[rank1]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                                         
[rank1]:   File "/home/mas-wang.zhenyu/anaconda3/envs/vllm/lib/python3.12/site-packages/peft/tuners/tuners_utils.py", line 193, in forward                        
[rank1]:     return self.model.forward(*args, **kwargs)
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/mas-wang.zhenyu/anaconda3/envs/vllm/lib/python3.12/site-packages/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py", line 1875, in forward
[rank1]:     logits = self.lm_head(hidden_states)
[rank1]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                                           
[rank1]:   File "/home/mas-wang.zhenyu/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
[rank1]:     return self._call_impl(*args, **kwargs)                                                                                                                                                                                        
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                  
[rank1]:   File "/home/mas-wang.zhenyu/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1845, in _call_impl
[rank1]:     return inner()                                                                                                                                                                                                                 
[rank1]:            ^^^^^^^
[rank1]:   File "/home/mas-wang.zhenyu/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1782, in inner
[rank1]:     args_result = hook(self, args)                                                                           
[rank1]:                   ^^^^^^^^^^^^^^^^                                                                           
[rank1]:   File "/home/mas-wang.zhenyu/anaconda3/envs/vllm/lib/python3.12/site-packages/deepspeed/runtime/zero/parameter_offload.py", line 372, in _post_backward_module_hook
[rank1]:     return apply_to_tensors_only(module.post_bwd_fn.apply,
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/mas-wang.zhenyu/anaconda3/envs/vllm/lib/python3.12/site-packages/deepspeed/runtime/zero/utils.py", line 133, in apply_to_tensors_only
[rank1]:     touched_output = apply_to_tensors_only(function, elem)
[rank1]:                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/mas-wang.zhenyu/anaconda3/envs/vllm/lib/python3.12/site-packages/deepspeed/runtime/zero/utils.py", line 149, in apply_to_tensors_only
[rank1]:     touched_output = function(value)
[rank1]:                      ^^^^^^^^^^^^^^^
[rank1]:   File "/home/mas-wang.zhenyu/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/autograd/function.py", line 575, in apply
[rank1]:     return super().apply(*args, **kwargs)  # type: ignore[misc]
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                    
[rank1]:   File "/home/mas-wang.zhenyu/anaconda3/envs/vllm/lib/python3.12/site-packages/deepspeed/runtime/zero/parameter_offload.py", line 440, in forward
[rank1]:     module.ds_grads_remaining += 1                                                                           
[rank1]:     ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/mas-wang.zhenyu/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1928, in __getattr__
[rank1]:     raise AttributeError(                 
[rank1]: AttributeError: 'Linear' object has no attribute 'ds_grads_remaining'
```

修改为ds_z2_config时出现oom

此外，在使用ds_z3_config时，添加如下配置：
```
enable_liger_kernel: True
use_unsloth_gc: True
```
出现了如下新的问题：
```
[rank1]: Traceback (most recent call last):                                                                                                                                                                                     
[rank1]:   File "/home/mas-wang.zhenyu/download/LLaMA-Factory/src/llamafactory/launcher.py", line 23, in <module>                                                                                                                           
[rank1]:     launch()                                                                                                                                                                                                                       
[rank1]:   File "/home/mas-wang.zhenyu/download/LLaMA-Factory/src/llamafactory/launcher.py", line 19, in launch                                                                                                                             
[rank1]:     run_exp()                                                                                                                                                                                                                      
[rank1]:   File "/home/mas-wang.zhenyu/download/LLaMA-Factory/src/llamafactory/train/tuner.py", line 107, in run_exp                                                                                                                        
[rank1]:     _training_function(config={"args": args, "callbacks": callbacks})                                                                                                                                                              
[rank1]:   File "/home/mas-wang.zhenyu/download/LLaMA-Factory/src/llamafactory/train/tuner.py", line 69, in _training_function                                                                                                              
[rank1]:     run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)                                                                                                                                     
[rank1]:   File "/home/mas-wang.zhenyu/download/LLaMA-Factory/src/llamafactory/train/sft/workflow.py", line 102, in run_sft                                                                                                                 
[rank1]:     train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
[rank1]:                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/mas-wang.zhenyu/anaconda3/envs/vllm/lib/python3.12/site-packages/transformers/trainer.py", line 2245, in train
[rank1]:     return inner_training_loop(
[rank1]:            ^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/mas-wang.zhenyu/anaconda3/envs/vllm/lib/python3.12/site-packages/transformers/trainer.py", line 2556, in _inner_training_loop
[rank1]:     tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
[rank1]:                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/mas-wang.zhenyu/anaconda3/envs/vllm/lib/python3.12/site-packages/transformers/trainer.py", line 3764, in training_step
[rank1]:     self.accelerator.backward(loss, **kwargs)
[rank1]:   File "/home/mas-wang.zhenyu/anaconda3/envs/vllm/lib/python3.12/site-packages/accelerate/accelerator.py", line 2351, in backward
[rank1]:     self.deepspeed_engine_wrapped.backward(loss, **kwargs)
[rank1]:   File "/home/mas-wang.zhenyu/anaconda3/envs/vllm/lib/python3.12/site-packages/accelerate/utils/deepspeed.py", line 266, in backward
[rank1]:     self.engine.backward(loss, **kwargs)
[rank1]:   File "/home/mas-wang.zhenyu/anaconda3/envs/vllm/lib/python3.12/site-packages/deepspeed/utils/nvtx.py", line 20, in wrapped_fn
[rank1]:     ret_val = func(*args, **kwargs)
[rank1]:               ^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/mas-wang.zhenyu/anaconda3/envs/vllm/lib/python3.12/site-packages/deepspeed/runtime/engine.py", line 2187, in backward
[rank1]:     self._do_optimizer_backward(loss, retain_graph)
[rank1]:   File "/home/mas-wang.zhenyu/anaconda3/envs/vllm/lib/python3.12/site-packages/deepspeed/runtime/engine.py", line 2133, in _do_optimizer_backward
[rank1]:     self.optimizer.backward(loss, retain_graph=retain_graph)
[rank1]:   File "/home/mas-wang.zhenyu/anaconda3/envs/vllm/lib/python3.12/site-packages/deepspeed/utils/nvtx.py", line 20, in wrapped_fn
[rank1]:     ret_val = func(*args, **kwargs)
[rank1]:               ^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/mas-wang.zhenyu/anaconda3/envs/vllm/lib/python3.12/site-packages/deepspeed/runtime/zero/stage3.py", line 2284, in backward
[rank1]:     self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)
[rank1]:   File "/home/mas-wang.zhenyu/anaconda3/envs/vllm/lib/python3.12/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward
[rank1]:     scaled_loss.backward(retain_graph=retain_graph)
[rank1]:   File "/home/mas-wang.zhenyu/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/_tensor.py", line 626, in backward
[rank1]:     torch.autograd.backward(
[rank1]:   File "/home/mas-wang.zhenyu/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/autograd/__init__.py", line 347, in backward
[rank1]:     _engine_run_backward(
[rank1]:   File "/home/mas-wang.zhenyu/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/autograd/graph.py", line 823, in _engine_run_backward
[rank1]:     return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: RuntimeError: The size of tensor a (0) must match the size of tensor b (3584) at non-singleton dimension 1
```

请问在不修改cutoff_len和max_pixels的前提下能否通过修改其他配置解决问题？求助！！

### Others

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Need Help!! qwen2.5vl7b lora sft with deepspeed zero3 #7588

Reminder

System Info

Reproduction

Others

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Need Help!! qwen2.5vl7b lora sft with deepspeed zero3 #7588

Description

Reminder

System Info

Reproduction

Others

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions