Skip to content

[Issue]: Running DS-R1-0528 hit abort if aiter was built with PREBUILD_KERNELS=1 in SGLang #1042

@sogalin

Description

@sogalin

Problem Description

Running DS-R1-0528 hit abort if aiter was built with PREBUILD_KERNELS=1 in SGLang.
This issue will not be found if "PREBUILD_KERNELS=0".

With PREBUILD_KERNELS=1, and the following command.
PREBUILD_KERNELS=1 GPU_ARCHS=gfx942 python setup.py develop;

Hit the following call stacks if running the following command in the image.
Image; rocm/sgl-dev:v0.5.3rc0-rocm700-mi30x-20250918
SGLANG_USE_AITER=1 python3 -m sglang.launch_server --model-path deepseek-ai/DeepSeek-R1-0528/ --tp 8 --trust-remote-code

Call stacks:

File "/sgl-workspace/sglang/python/sglang/srt/layers/quantization/fp8_utils.py", line 273, in aiter_w8a8_block_fp8_linear
output = gemm_a8w8_blockscale(
File "/sgl-workspace/aiter/aiter/ops/gemm_op_a8w8.py", line 408, in gemm_a8w8_blockscale
get_CKGEMM_config(m, n, k, "a8w8_blockscale_tuned_gemm.csv")
File "/sgl-workspace/aiter/aiter/ops/gemm_op_a8w8.py", line 225, in get_CKGEMM_config
padded_M = M if gl is None else get_padded_m(M, N, K, gl)
File "/sgl-workspace/aiter/aiter/jit/core.py", line 975, in wrapper_custom
result = getattr(torch.ops.aiter, f"wrapper_{loadName}")(
File "/opt/venv/lib/python3.10/site-packages/torch/_ops.py", line 1254, in call
return self._op(*args, **kwargs)
File "/sgl-workspace/aiter/aiter/jit/core.py", line 955, in outer_wrapper
else (torch.empty(1, device="cuda"), wrapper(*args, **kwargs))
File "/sgl-workspace/aiter/aiter/jit/core.py", line 731, in wrapper
module = get_module(md)
File "/sgl-workspace/aiter/aiter/jit/core.py", line 249, in get_module
get_module_custom_op(md_name)
File "/sgl-workspace/aiter/aiter/jit/utils/torch_guard.py", line 64, in outer_wrapper
return getattr(torch.ops.aiter, op_name)(dummy, *args, **kwargs)
File "/opt/venv/lib/python3.10/site-packages/torch/_ops.py", line 1254, in call
return self._op(*args, **kwargs)
File "/sgl-workspace/aiter/aiter/jit/utils/torch_guard.py", line 95, in custom_impl
return func(*args, **kwargs)
File "/sgl-workspace/aiter/aiter/jit/core.py", line 242, in get_module_custom_op
__mds[md_name] = importlib.import_module(f"{package}.{md_name}")
File "/usr/lib/python3.10/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)

Operating System

22.04.5 LTS (Jammy Jellyfish)

CPU

Intel(R) Xeon(R) Platinum 8468

GPU

MI300

ROCm Version

ROCm7.0.0

ROCm Component

No response

Steps to Reproduce

With issue: rocm/sgl-dev:v0.5.3rc0-rocm700-mi30x-20250918
Without issue: rocm/sgl-dev:v0.5.3rc0-rocm700-mi30x-20250918-wo-aiter-prebuild

SGLANG_USE_AITER=1 python3 -m sglang.launch_server --model-path deepseek-ai/DeepSeek-R1-0528/ --tp 8 --trust-remote-code

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions