-
Notifications
You must be signed in to change notification settings - Fork 118
Description
Problem Description
Running DS-R1-0528 hit abort if aiter was built with PREBUILD_KERNELS=1 in SGLang.
This issue will not be found if "PREBUILD_KERNELS=0".
With PREBUILD_KERNELS=1, and the following command.
PREBUILD_KERNELS=1 GPU_ARCHS=gfx942 python setup.py develop;
Hit the following call stacks if running the following command in the image.
Image; rocm/sgl-dev:v0.5.3rc0-rocm700-mi30x-20250918
SGLANG_USE_AITER=1 python3 -m sglang.launch_server --model-path deepseek-ai/DeepSeek-R1-0528/ --tp 8 --trust-remote-code
Call stacks:
File "/sgl-workspace/sglang/python/sglang/srt/layers/quantization/fp8_utils.py", line 273, in aiter_w8a8_block_fp8_linear
output = gemm_a8w8_blockscale(
File "/sgl-workspace/aiter/aiter/ops/gemm_op_a8w8.py", line 408, in gemm_a8w8_blockscale
get_CKGEMM_config(m, n, k, "a8w8_blockscale_tuned_gemm.csv")
File "/sgl-workspace/aiter/aiter/ops/gemm_op_a8w8.py", line 225, in get_CKGEMM_config
padded_M = M if gl is None else get_padded_m(M, N, K, gl)
File "/sgl-workspace/aiter/aiter/jit/core.py", line 975, in wrapper_custom
result = getattr(torch.ops.aiter, f"wrapper_{loadName}")(
File "/opt/venv/lib/python3.10/site-packages/torch/_ops.py", line 1254, in call
return self._op(*args, **kwargs)
File "/sgl-workspace/aiter/aiter/jit/core.py", line 955, in outer_wrapper
else (torch.empty(1, device="cuda"), wrapper(*args, **kwargs))
File "/sgl-workspace/aiter/aiter/jit/core.py", line 731, in wrapper
module = get_module(md)
File "/sgl-workspace/aiter/aiter/jit/core.py", line 249, in get_module
get_module_custom_op(md_name)
File "/sgl-workspace/aiter/aiter/jit/utils/torch_guard.py", line 64, in outer_wrapper
return getattr(torch.ops.aiter, op_name)(dummy, *args, **kwargs)
File "/opt/venv/lib/python3.10/site-packages/torch/_ops.py", line 1254, in call
return self._op(*args, **kwargs)
File "/sgl-workspace/aiter/aiter/jit/utils/torch_guard.py", line 95, in custom_impl
return func(*args, **kwargs)
File "/sgl-workspace/aiter/aiter/jit/core.py", line 242, in get_module_custom_op
__mds[md_name] = importlib.import_module(f"{package}.{md_name}")
File "/usr/lib/python3.10/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
Operating System
22.04.5 LTS (Jammy Jellyfish)
CPU
Intel(R) Xeon(R) Platinum 8468
GPU
MI300
ROCm Version
ROCm7.0.0
ROCm Component
No response
Steps to Reproduce
With issue: rocm/sgl-dev:v0.5.3rc0-rocm700-mi30x-20250918
Without issue: rocm/sgl-dev:v0.5.3rc0-rocm700-mi30x-20250918-wo-aiter-prebuild
SGLANG_USE_AITER=1 python3 -m sglang.launch_server --model-path deepseek-ai/DeepSeek-R1-0528/ --tp 8 --trust-remote-code
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
No response