Skip to content

Add workflow for building NPU image #8546

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Jul 4, 2025
Merged

Add workflow for building NPU image #8546

merged 12 commits into from
Jul 4, 2025

Conversation

wjunLu
Copy link
Contributor

@wjunLu wjunLu commented Jul 4, 2025

What does this PR do?

Add workflow for building multi-arch (x86 and aarch64) NPU image.

issue

Fixes partially #8540

@wjunLu
Copy link
Contributor Author

wjunLu commented Jul 4, 2025

I have tested docker_npu workflow on forked repo, and it can successfully run all process before build.

Since the runner space is not enough, it didn't finish the whole process. But the logs show the multi-arch build is OK, arm64 and amd64 images were both building

#15 20.32    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 19.5 MB/s eta 0:00:00
[801](https://github.com/wjunLu/LLaMA-Factory/actions/runs/16064458846/job/45336496246#step:9:807)
#15 ...
[802](https://github.com/wjunLu/LLaMA-Factory/actions/runs/16064458846/job/45336496246#step:9:808)
#14 [linux/amd64 5/8] RUN pip install --no-cache-dir -r requirements.txt
[803](https://github.com/wjunLu/LLaMA-Factory/actions/runs/16064458846/job/45336496246#step:9:809)
#14 21.87    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 571.0/571.0 MB 121.2 MB/s eta 0:00:00
[804](https://github.com/wjunLu/LLaMA-Factory/actions/runs/16064458846/job/45336496246#step:9:810)
#14 21.88 Downloading nvidia_cufft_cu12-11.3.0.4-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (200.2 MB)
[806](https://github.com/wjunLu/LLaMA-Factory/actions/runs/16064458846/job/45336496246#step:9:812)
#14 23.85 ERROR: Could not install packages due to an OSError: [Errno 28] No space left on device
[807](https://github.com/wjunLu/LLaMA-Factory/actions/runs/16064458846/job/45336496246#step:9:813)
#14 23.85 
[808](https://github.com/wjunLu/LLaMA-Factory/actions/runs/16064458846/job/45336496246#step:9:814)
#14 23.86    ━━━━━━━━━━━━━━━━━━━━━━━━                 120.3/200.2 MB 60.9 MB/s eta 0:00:02
[809](https://github.com/wjunLu/LLaMA-Factory/actions/runs/16064458846/job/45336496246#step:9:815)
#14 ERROR: process "/bin/bash -c pip install --no-cache-dir -r requirements.txt" did not complete successfully: exit code: 1
[810](https://github.com/wjunLu/LLaMA-Factory/actions/runs/16064458846/job/45336496246#step:9:816)
#15 [linux/arm64 3/8] RUN pip config set global.index-url "https://pypi.org/simple" &&     pip config set global.extra-index-url "https://pypi.org/simple" &&     pip install --no-cache-dir --upgrade pip packaging wheel setuptools
[811](https://github.com/wjunLu/LLaMA-Factory/actions/runs/16064458846/job/45336496246#step:9:817)
#15 23.09 Installing collected packages: wheel, setuptools, pip, packaging
[812](https://github.com/wjunLu/LLaMA-Factory/actions/runs/16064458846/job/45336496246#step:9:818)
#15 23.72   Attempting uninstall: setuptools
[813](https://github.com/wjunLu/LLaMA-Factory/actions/runs/16064458846/job/45336496246#step:9:819)
#15 23.80     Found existing installation: setuptools 65.5.0
[814](https://github.com/wjunLu/LLaMA-Factory/actions/runs/16064458846/job/45336496246#step:9:820)
#15 24.16     Uninstalling setuptools-65.5.0:
[815](https://github.com/wjunLu/LLaMA-Factory/actions/runs/16064458846/job/45336496246#step:9:821)
#15 24.91       Successfully uninstalled setuptools-65.5.0
[816](https://github.com/wjunLu/LLaMA-Factory/actions/runs/16064458846/job/45336496246#step:9:822)
#15 CANCELED
[817](https://github.com/wjunLu/LLaMA-Factory/actions/runs/16064458846/job/45336496246#step:9:823)
------
[818](https://github.com/wjunLu/LLaMA-Factory/actions/runs/16064458846/job/45336496246#step:9:824)
 > [linux/amd64 5/8] RUN pip install --no-cache-dir -r requirements.txt:
[819](https://github.com/wjunLu/LLaMA-Factory/actions/runs/16064458846/job/45336496246#step:9:825)
15.51 Downloading nvidia_cublas_cu12-12.6.4.1-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (393.1 MB)
[820](https://github.com/wjunLu/LLaMA-Factory/actions/runs/16064458846/job/45336496246#step:9:826)
18.21 Downloading nvidia_cuda_runtime_cu12-12.6.77-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (897 kB)
[821](https://github.com/wjunLu/LLaMA-Factory/actions/runs/16064458846/job/45336496246#step:9:827)
18.22    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 897.7/897.7 kB 459.4 MB/s eta 0:00:00
[822](https://github.com/wjunLu/LLaMA-Factory/actions/runs/16064458846/job/45336496246#step:9:828)
18.23 Downloading nvidia_cudnn_cu12-9.5.1.17-py3-none-manylinux_2_28_x86_64.whl (571.0 MB)
[823](https://github.com/wjunLu/LLaMA-Factory/actions/runs/16064458846/job/45336496246#step:9:829)
21.88 Downloading nvidia_cufft_cu12-11.3.0.4-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (200.2 MB)
[824](https://github.com/wjunLu/LLaMA-Factory/actions/runs/16064458846/job/45336496246#step:9:830)
23.85 ERROR: Could not install packages due to an OSError: [Errno 28] No space left on device
[825](https://github.com/wjunLu/LLaMA-Factory/actions/runs/16064458846/job/45336496246#step:9:831)
23.85

@wjunLu wjunLu force-pushed the workflow branch 2 times, most recently from e20da02 to 448cecd Compare July 4, 2025 06:56
@wjunLu
Copy link
Contributor Author

wjunLu commented Jul 4, 2025

Since docker/docker-npu/Dockerfile changed, I re-tested the NPU image built from it, the results still OK
image
(Above picture shows the self-test result about docker_npu.yml)

  • Start container with the new image quay.io/wjunlu27/llamafactory:0.9.4-npu-a2 (pull from quay.io is faster than docker.io)
docker run -it \
  -v $PWD/hf_cache:/root/.cache/huggingface \
  -v $PWD/ms_cache:/root/.cache/modelscope \
  -v $PWD/data:/app/data \
  -v $PWD/output:/app/output \
  -v /usr/local/dcmi:/usr/local/dcmi \
  -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
  -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
  -v /etc/ascend_install.info:/etc/ascend_install.info \
  -v $PWD/Test:/home/Test/ \
  -p 7861:7860 \
  -p 8010:8000 \
  --device /dev/davinci0 \
  --device /dev/davinci_manager \
  --device /dev/devmm_svm \
  --device /dev/hisi_hdc \
  --shm-size 16G \
  --name llamafactory   quay.io/wjunlu/llamafactory:0.9.4-npu-a2 bash 
  • Install DeepSpeed and ModelScope, and set ENV
pip install -e ".[deepspeed,modelscope]" -i https://pypi.tuna.tsinghua.edu.cn/simple
export ASCEND_RT_VISIBLE_DEVICES=0
export USE_MODELSCOPE_HUB=1
  • Check llamafactory env
$ llamafactory-cli env

- `llamafactory` version: 0.9.4.dev0
- Platform: Linux-4.19.90-vhulk2211.3.0.h1804.eulerosv2r10.aarch64-aarch64-with-glibc2.35
- Python version: 3.11.12
- PyTorch version: 2.5.1 (NPU)
- Transformers version: 4.52.4
- Datasets version: 3.6.0
- Accelerate version: 1.7.0
- PEFT version: 0.15.2
- TRL version: 0.9.6
- NPU type: Ascend910B3
- CANN version: 8.1.RC1
- Default data directory: detected
  • Finetune
torchrun \
    --nproc_per_node 1 \
    --nnodes 1 \
    --node_rank 0 \
    --master_addr 127.0.0.1 \
    --master_port 7007 \
    src/train.py /home/Test/qwen1_5_lora_sft_ds.yaml

The result is

...
{'loss': 0.9117, 'grad_norm': 36778.3188305284, 'learning_rate': 3.727866032169127e-05, 'epoch': 1.87}
 63%|████████████████████████████████████████████████████████▋                                 | 928/1473 [09:43<05:37,  1.62it/s]
...
***** Running Evaluation *****
[INFO|trainer.py:4329] 2025-07-04 09:44:43,140 >>   Num examples = 110
[INFO|trainer.py:4332] 2025-07-04 09:44:43,140 >>   Batch size = 1
{'eval_loss': 0.9412215352058411, 'eval_runtime': 8.0786, 'eval_samples_per_second': 13.616, 'eval_steps_per_second': 13.616, 'epoch': 2.04}
 68%|████████████████████████████████████████████████████████████▍                            | 1000/1473 [10:35<04:45,  1.66it/s[INFO|trainer.py:3993] 2025-07-04 09:44:58,198 >> Saving model checkpoint to saves/Qwen1.5-7B/lora/sft/checkpoint-1000
...
[INFO|trainer.py:4332] 2025-07-04 09:50:05,443 >>   Batch size = 1
100%|███████████████████████████████████████████████████████████████████████████████████████████| 110/110 [00:07<00:00, 14.00it/s]
***** eval metrics *****
  epoch                   =        3.0
  eval_loss               =     0.9487
  eval_runtime            = 0:00:07.95
  eval_samples_per_second =     13.834
  eval_steps_per_second   =     13.834
[INFO|modelcard.py:450] 2025-07-04 09:50:13,393 >> Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}
  • Inference
llamafactory-cli chat \
    --model_name_or_path qwen/Qwen1.5-7B \
    --adapter_name_or_path saves/Qwen1.5-7B/lora/sft \
    --template qwen \
    --finetuning_type lora

The result is

[INFO|configuration_utils.py:1135] 2025-07-04 09:59:22,188 >> Generate config GenerationConfig {
  "bos_token_id": 151643,
  "eos_token_id": 151643,
  "max_new_tokens": 2048
}

[INFO|2025-07-04 09:59:22] llamafactory.model.model_utils.attention:143 >> Using torch SDPA for faster training and inference.
[INFO|2025-07-04 09:59:22] llamafactory.model.adapter:143 >> Merged 1 adapter(s).
[INFO|2025-07-04 09:59:22] llamafactory.model.adapter:143 >> Loaded adapter(s): saves/Qwen1.5-7B/lora/sft
[INFO|2025-07-04 09:59:22] llamafactory.model.loader:143 >> all params: 7,721,324,544
Welcome to the CLI application, use `clear` to remove the history, use `exit` to exit the application.

User: 如何放松紧张的精神状态
Assistant: 紧张的精神状态可能会导致身体和心理上的压力,影响日常生活和工作效率。以下是一些方法可以帮助放松紧张的精神状态:

1. 深呼吸:通过深呼吸来缓解紧张情绪,让身体放松。可以慢慢地吸气,然后慢慢地呼气,每次重复几次。

2. 渐进性肌肉松弛:通过有意识地放松肌肉来减轻身体紧张。可以先紧张肌肉,然后放松,逐渐放松整个身体。

3. 沉浸式体验:例如听音乐、看电影、阅读等,让自己沉浸在愉悦的体验中,缓解紧张情绪。

4. 运动:运动可以释放身体内的紧张情绪,例如跑步、瑜伽、散步等。

5. 改变思维方式:通过改变自己的思维方式,例如正面思考、接受现实等,来减轻紧张情绪。

6. 社交:与朋友、家人等进行交流,分享自己的感受,可以减轻紧张情绪。

以上方法可以根据自己的需要和喜好进行选择,但需要注意的是,如果紧张情绪持续存在或影响日常生活,建议咨询专业医生或心理医生。

User:

@wjunLu wjunLu requested a review from hiyouga July 4, 2025 11:16
Copy link
Owner

@hiyouga hiyouga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hiyouga hiyouga merged commit d30cbcd into hiyouga:main Jul 4, 2025
17 checks passed
@hiyouga hiyouga added the solved This problem has been already solved label Jul 4, 2025
@wjunLu wjunLu deleted the workflow branch July 10, 2025 14:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved This problem has been already solved
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants