Skip to content

尝试使用CPU训练时,无法将任务分布到多CPU上 #4013

@furanger

Description

@furanger
尝试使用CPU主对 Qwen2.5-vl-3b进行微调,可以进行入训练过程,

INFO:swift] Successfully registered post_encode hook: ['Qwen2_5_VLForConditionalGeneration'].
Train: 0%| | 0/12945 [00:00<?, ?it/s]/home/physo/venv/swift/lib/python3.10/site-packages/torch/utils/checkpoint.py:92: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
warnings.warn(

但是查看CPU,多CPU只有单一CPU在运算。

Average: CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
Average: all 3.28 0.00 0.06 0.00 0.00 0.00 0.00 0.00 0.00 96.66

使用参数为


NPROC_PER_NODE=8;CUDA_VISIBLE_DEVICES=-1 ; swift sft --model_type qwen2_5_vl --model /home/physo/project/model/Qwen2.5-VL-3B-Instruct --dataset /home/physo/project/coco/output1 --train_type all-linear --torch_dtype float16 --device_map cpu --use_cpu True

能否在intel arm上进行多CPU微调及发布活动?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions