Skip to content

[Bug]: vllm-ascend 离线1p1d分离报错,和vllm不兼容 #1074

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1027866388 opened this issue Jun 5, 2025 · 0 comments
Open

[Bug]: vllm-ascend 离线1p1d分离报错,和vllm不兼容 #1074

1027866388 opened this issue Jun 5, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@1027866388
Copy link

Your current environment

INFO 06-05 11:21:09 [importing.py:16] Triton not installed or not compatible; certain GPU-related functions will not be available.
WARNING 06-05 11:21:09 [importing.py:28] Triton is not installed. Using dummy decorators. Install it via pip install triton to enable kernel compilation.
INFO 06-05 11:21:11 [init.py:38] Available plugins for group vllm.platform_plugins:
INFO 06-05 11:21:11 [init.py:40] - ascend -> vllm_ascend:register
INFO 06-05 11:21:11 [init.py:43] All plugins in this group will be loaded. Set VLLM_PLUGINS to control which plugins to load.
INFO 06-05 11:21:11 [init.py:234] Platform plugin ascend is activated
WARNING 06-05 11:21:14 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
Collecting environment information...
PyTorch version: 2.5.1
Is debug build: False

OS: Ubuntu Jammy Jellyfish (development branch) (aarch64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: version 4.0.2
Libc version: glibc-2.35

Python version: 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 21:44:20) [GCC 12.3.0] (64-bit runtime)
Python platform: Linux-4.19.90-2107.6.0.0192.8.oe1.bclinux.aarch64-aarch64-with-glibc2.35

CPU:
Architecture: aarch64
CPU op-mode(s): 64-bit
Byte Order: Little Endian
CPU(s): 192
On-line CPU(s) list: 0-191
Vendor ID: HiSilicon
Model name: Kunpeng-920
Model: 0
Thread(s) per core: 1
Core(s) per cluster: 48
Socket(s): -
Cluster(s): 4
Stepping: 0x1
BogoMIPS: 200.00
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm ssbs
L1d cache: 12 MiB (192 instances)
L1i cache: 12 MiB (192 instances)
L2 cache: 96 MiB (192 instances)
L3 cache: 192 MiB (8 instances)
NUMA node(s): 8
NUMA node0 CPU(s): 0-23
NUMA node1 CPU(s): 24-47
NUMA node2 CPU(s): 48-71
NUMA node3 CPU(s): 72-95
NUMA node4 CPU(s): 96-119
NUMA node5 CPU(s): 120-143
NUMA node6 CPU(s): 144-167
NUMA node7 CPU(s): 168-191
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; __user pointer sanitization
Vulnerability Spectre v2: Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected

Versions of relevant libraries:
[pip3] gpytorch==1.14
[pip3] numpy==1.26.4
[pip3] pyzmq==26.2.1
[pip3] torch==2.5.1
[pip3] torch-npu==2.5.1
[pip3] torchair==0.1
[pip3] torchvision==0.20.1
[pip3] transformers==4.52.4
[pip3] transformers-stream-generator==0.0.5
[conda] gpytorch 1.14 pypi_0 pypi
[conda] numpy 1.26.4 pypi_0 pypi
[conda] pyzmq 26.2.1 pypi_0 pypi
[conda] torch 2.5.1 pypi_0 pypi
[conda] torch-npu 2.5.1 pypi_0 pypi
[conda] torchair 0.1 pypi_0 pypi
[conda] torchvision 0.20.1 pypi_0 pypi
[conda] transformers 4.52.4 pypi_0 pypi
[conda] transformers-stream-generator 0.0.5 pypi_0 pypi
vLLM Version: 0.9.1.dev129+g6d18ed2a2 (git sha: 6d18ed2a2)
vLLM Ascend Version: 0.8.5rc2.dev69+g068c3a0.d20250603 (git sha: 068c3a0, date: 20250603)

ENV Variables:
ASCEND_VISIBLE_DEVICES=4,5,7,1
ASCEND_RUNTIME_OPTIONS=
ASCEND_TOOLKIT_HOME=/usr/local/Ascend/ascend-toolkit/latest
ASCEND_OPP_PATH=/usr/local/Ascend/ascend-toolkit/latest/opp
LD_LIBRARY_PATH=/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64/plugin:/usr/local/Ascend/ascend-toolkit/latest/lib64:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/opskernel:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/nnengine:/usr/local/Ascend/ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe/op_tiling/lib/linux/aarch64:/usr/local/Ascend/driver/lib64:/usr/local/Ascend/driver/lib64/common:/usr/local/Ascend/driver/lib64/driver:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64/plugin:/usr/local/Ascend/ascend-toolkit/latest/lib64:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/opskernel:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/nnengine:/usr/local/Ascend/ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe/op_tiling/lib/linux/aarch64:/usr/local/Ascend/driver/lib64:/usr/local/Ascend/driver/lib64/common:/usr/local/Ascend/driver/lib64/driver:/usr/local/Ascend/ascend-toolkit/latest/lib64:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/opskernel:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/nnengine:/usr/local/Ascend/ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe/op_tiling/lib/linux/aarch64:/usr/local/Ascend/driver/lib64:/usr/local/Ascend/driver/lib64/common:/usr/local/Ascend/driver/lib64/driver
ASCEND_AICPU_PATH=/usr/local/Ascend/ascend-toolkit/latest
ASCEND_HOME_PATH=/usr/local/Ascend/ascend-toolkit/latest
TORCH_DEVICE_BACKEND_AUTOLOAD=1
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1

NPU:
+------------------------------------------------------------------------------------------------+
| npu-smi 24.1.0.3 Version: 24.1.0.3 |
+---------------------------+---------------+----------------------------------------------------+
| NPU Name | Health | Power(W) Temp(C) Hugepages-Usage(page)|
| Chip | Bus-Id | AICore(%) Memory-Usage(MB) HBM-Usage(MB) |
+===========================+===============+====================================================+
| 1 910B2 | OK | 94.8 42 0 / 0 |
| 0 | 0000:01:00.0 | 0 0 / 0 3389 / 65536 |
+===========================+===============+====================================================+
| 4 910B2 | OK | 97.8 40 0 / 0 |
| 0 | 0000:81:00.0 | 0 0 / 0 3390 / 65536 |
+===========================+===============+====================================================+
| 5 910B2 | OK | 98.2 44 0 / 0 |
| 0 | 0000:41:00.0 | 0 0 / 0 3386 / 65536 |
+===========================+===============+====================================================+
| 7 910B2 | OK | 97.9 45 0 / 0 |
| 0 | 0000:42:00.0 | 0 0 / 0 3389 / 65536 |
+===========================+===============+====================================================+
+---------------------------+---------------+----------------------------------------------------+
| NPU Chip | Process id | Process name | Process memory(MB) |
+===========================+===============+====================================================+
| No running processes found in NPU 1 |
+===========================+===============+====================================================+
| No running processes found in NPU 4 |
+===========================+===============+====================================================+
| No running processes found in NPU 5 |
+===========================+===============+====================================================+
| No running processes found in NPU 7 |
+===========================+===============+====================================================+

CANN:
package_name=Ascend-cann-toolkit
version=8.0.0
innerversion=V100R001C20SPC001B251
compatible_version=[V100R001C15],[V100R001C17],[V100R001C18],[V100R001C19],[V100R001C20]
arch=aarch64
os=linux
path=/usr/local/Ascend/ascend-toolkit/8.0.0/aarch64-linux

🐛 Describe the bug

python offline_disaggregated_prefill_npu.py

INFO 06-05 11:24:52 [importing.py:16] Triton not installed or not compatible; certain GPU-related functions will not be available.
WARNING 06-05 11:24:52 [importing.py:28] Triton is not installed. Using dummy decorators. Install it via pip install triton to enable kernel compilation.
INFO 06-05 11:24:52 [importing.py:16] Triton not installed or not compatible; certain GPU-related functions will not be available.
WARNING 06-05 11:24:52 [importing.py:28] Triton is not installed. Using dummy decorators. Install it via pip install triton to enable kernel compilation.
INFO 06-05 11:24:55 [init.py:38] Available plugins for group vllm.platform_plugins:
INFO 06-05 11:24:55 [init.py:40] - ascend -> vllm_ascend:register
INFO 06-05 11:24:55 [init.py:43] All plugins in this group will be loaded. Set VLLM_PLUGINS to control which plugins to load.
INFO 06-05 11:24:55 [init.py:234] Platform plugin ascend is activated
INFO 06-05 11:24:55 [init.py:38] Available plugins for group vllm.platform_plugins:
INFO 06-05 11:24:55 [init.py:40] - ascend -> vllm_ascend:register
INFO 06-05 11:24:55 [init.py:43] All plugins in this group will be loaded. Set VLLM_PLUGINS to control which plugins to load.
INFO 06-05 11:24:55 [init.py:234] Platform plugin ascend is activated
WARNING 06-05 11:24:57 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
WARNING 06-05 11:24:57 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
Process Process-2:
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/workspace/mnt/wangyanhui/vllm-ascend/examples/offline_disaggregated_prefill_npu.py", line 89, in run_decode
ktc = KVTransferConfig.from_cli(
AttributeError: type object 'KVTransferConfig' has no attribute 'from_cli'
Process Process-1:
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/workspace/mnt/wangyanhui/vllm-ascend/examples/offline_disaggregated_prefill_npu.py", line 49, in run_prefill
ktc = KVTransferConfig.from_cli(
AttributeError: type object 'KVTransferConfig' has no attribute 'from_cli'
All process done!

@1027866388 1027866388 added the bug Something isn't working label Jun 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant