-
Notifications
You must be signed in to change notification settings - Fork 118
Open
Description
Problem Description
if AITER_ENABLE_VSKIP is unset, it will be set to true, leading to some issues running Deepseek-R1 on vllm on MI300X.
Error details:
:0:rocdevice.cpp :3675: 2139490044663 us: Callback: Queue 0x7ee7b4200000 aborting with error : HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION: The agent attempted to access memory beyond the largest legal address. code: 0x29
Kernel Name: _ZN5aiter50fmoe_bf16_blockscaleFp8_g1u1_vs_silu_1tg_ps_32x256E
VGPU=0x1305a780 SWq=0x7f17d4008000, HWq=0x7ee7b4200000, id=1
Dispatch Header = 0xb02 (type=2, barrier=1, acquire=1, release=1), setup=0
grid=[77824, 1, 1], workgroup=[256, 1, 1]
private_seg_size=0, group_seg_size=65536
kernel_obj=0x7ee7403f1d00, kernarg_address=0x0x7ed0950bf400
completion_signal=0x0, correlation_id=0
rptr=325385, wptr=325387
Kernel Name: _ZN5aiter50fmoe_bf16_blockscaleFp8_g1u1_vs_silu_1tg_ps_32x256E
VGPU=0xcc00ea0 SWq=0x7f17d4008000, HWq=0x7ee7b4200000, id=1
Dispatch Header = 0xb02 (type=2, barrier=1, acquire=1, release=1), setup=0
grid=[77824, 1, 1], workgroup=[256, 1, 1]
private_seg_size=0, group_seg_size=65536
kernel_obj=0x7ee7403f1d00, kernarg_address=0x0x7ed0950bf400
completion_signal=0x0, correlation_id=0
rptr=325385, wptr=325387
Kernel Name: _ZN5aiter50fmoe_bf16_blockscaleFp8_g1u1_vs_silu_1tg_ps_32x256E
VGPU=0x24861940 SWq=0x7f17d4008000, HWq=0x7ee7b4200000, id=1
Dispatch Header = 0xb02 (type=2, barrier=1, acquire=1, release=1), setup=0
grid=[77824, 1, 1], workgroup=[256, 1, 1]
private_seg_size=0, group_seg_size=65536
kernel_obj=0x7ee7403f1d00, kernarg_address=0x0x7ed0950bf400
completion_signal=0x0, correlation_id=0
rptr=325385, wptr=325387
[AITER] /app/upstreambugfix/aiter20251007/aiter/jit/build/module_moe_asm/build/srcs/asm_fmoe.hip:250 fail to call hipModuleLaunchKernel( kernel_func, gdx, gdy, gdz, bdx, 1, 1, 0, stream, nullptr, (void**)&config) ---> [HIP error](an illegal memory access was encountered)
Error code 700
Error code 700
In older aiter commits: 6b586ae, on MI300X the signature of the working kernel is _ZN5aiter52fmoe_bf16_blockscaleFp8_g1u1_novs_silu_1tg_ps_32x256E.
Additional information:
on MI308, it is calling:
_ZN5aiter59fmoe_stage1_bf16_pertokenFp8_blockscale_g1u1_64x128_2tg_pf3E (
80,128,7168,256,256,8,ActivationType.Silu,torch.bfloat16,torch.float8_e4m3fnuz,torch.float8_e4m3fnuz,QuantType.per_1x128,1,0,64,0,400.1271,_ZN5aiter59fmoe_stage1_bf16_pertokenFp8_blockscale_g1u1_64x128_2tg_pf3E,5.0%,873.4093,moe_ck2stages_gemm2_256x64x128x128_1x4_MulABScaleExpertWeightA8W8blkscale_v3_Nswizzle0_Quant4_MulRoutedWeight1_F8_F8_B16,15.5%,1273.5364,0,8.85,1108.75
A proposed solution can be found in #1136
Operating System
NAME="Ubuntu" VERSION="22.04.5 LTS (Jammy Jellyfish)"
CPU
AMD EPYC 9654 96-Core Processor
GPU
amdgcn-amd-amdhsa--gfx942:sramecc+:xnack-
ROCm Version
7.0
ROCm Component
No response
Steps to Reproduce
1- start a container
docker run -it \
--network=host \
--group-add=video \
--ipc=host \
--cap-add=SYS_PTRACE \
--shm-size=16g \
--security-opt seccomp=unconfined \
--device /dev/kfd \
--device /dev/dri \
--name "name" \
rocm/vllm-dev:nightly_main_20250924 \
bash
2- install the latest versions of vllm and aiter
3- serve deepseek-ai/DeepSeek-R1
VLLM_ROCM_USE_AITER=1 \
vllm serve deepseek-ai/DeepSeek-R1 \
--tensor-parallel-size 8 \
--block-size 1 \
--trust-remote-code \
--no-enable-prefix-caching \
--max-model-len 32768 \
--port 8010 \
> logs/server.log 2>&1
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
No response
Metadata
Metadata
Assignees
Labels
No labels