Skip to content

尝试使用CPU训练时,无法将任务分布到多CPU上 #4013

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
furanger opened this issue Apr 27, 2025 · 0 comments
Open

尝试使用CPU训练时,无法将任务分布到多CPU上 #4013

furanger opened this issue Apr 27, 2025 · 0 comments

Comments

@furanger
Copy link

furanger commented Apr 27, 2025

尝试使用CPU主对 Qwen2.5-vl-3b进行微调,可以进行入训练过程,

INFO:swift] Successfully registered post_encode hook: ['Qwen2_5_VLForConditionalGeneration'].
Train: 0%| | 0/12945 [00:00<?, ?it/s]/home/physo/venv/swift/lib/python3.10/site-packages/torch/utils/checkpoint.py:92: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
warnings.warn(

但是查看CPU,多CPU只有单一CPU在运算。

Average: CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
Average: all 3.28 0.00 0.06 0.00 0.00 0.00 0.00 0.00 0.00 96.66

使用参数为


NPROC_PER_NODE=8;CUDA_VISIBLE_DEVICES=-1 ; swift sft --model_type qwen2_5_vl --model /home/physo/project/model/Qwen2.5-VL-3B-Instruct --dataset /home/physo/project/coco/output1 --train_type all-linear --torch_dtype float16 --device_map cpu --use_cpu True

能否在intel arm上进行多CPU微调及发布活动?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant