RMSNorm failed with error code no kernel image is available for execution on the device #920

nanmi · 2025-03-07T10:38:10Z

[TensorRT-LLM] TensorRT-LLM version: 0.18.0.dev2025022500
[TensorRT-LLM][INFO] Refreshed the MPI local session
Model init total -- 387.22s
2025-03-07 11:03:44,669 - INFO - flashinfer.jit: Loading JIT ops: norm
2025-03-07 11:03:44,689 - INFO - flashinfer.jit: Finished loading JIT ops: norm
CUDA Error: no kernel image is available for execution on the device (209) /usr/local/lib/python3.12/dist-packages/flashinfer/data/include/flashinfer/norm.cuh: line 113 at function cudaLaunchKernel((void*)kernel, nblks, nthrs, args, smem_size, stream)
[03/07/2025-11:03:44] [TRT-LLM] [E] Failed to initialize executor on rank 6: RMSNorm failed with error code no kernel image is available for execution on the device
[ERROR    | TRT-LLM            ]: [TRT-LLM] [E] Failed to initialize executor on rank 6: RMSNorm failed with error code no kernel image is available for execution on the device
[03/07/2025-11:03:44] [TRT-LLM] [E] Traceback (most recent call last):

  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/worker.py", line 581, in worker_main
    worker: ExecutorBindingsWorker = worker_cls(
                                     ^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/worker.py", line 126, in __init__
    self.engine = _create_engine()
                  ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/worker.py", line 122, in _create_engine
    return create_executor(executor_config=executor_config,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor_creator.py", line 106, in create_py_executor
    kv_cache_max_tokens = estimate_max_kv_cache_tokens(model_engine,
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/_util.py", line 118, in estimate_max_kv_cache_tokens
    model_engine.forward(req, resource_manager)
  File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/model_engine.py", line 990, in forward
    return self._forward_step(inputs, gather_ids)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/model_engine.py", line 1030, in _forward_step
    logits = self.model.forward(**inputs,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/models/modeling_utils.py", line 234, in forward
    hidden_states = self.model(
                    ^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1740, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/models/modeling_deepseekv3.py", line 379, in forward
    hidden_states, residual = decoder_layer(position_ids=position_ids,
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1740, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/models/modeling_deepseekv3.py", line 319, in forward
    hidden_states = self.input_layernorm(hidden_states)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1740, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/modules/rms_norm.py", line 32, in forward
    return flashinfer_rmsnorm(hidden_states, self.weight,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/_library/custom_ops.py", line 637, in __call__
    return self._opoverload(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 723, in __call__
    return self._op(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/_library/custom_ops.py", line 305, in backend_impl
    result = self._backend_fns[device_type](*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/_compile.py", line 32, in inner
    return disable_fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 738, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/_library/custom_ops.py", line 337, in wrapped_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/custom_op.py", line 237, in flashinfer_rmsnorm
    return rmsnorm(input, weight, eps)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/flashinfer/norm.py", line 73, in rmsnorm
    _rmsnorm(out, input, weight, eps)
  File "/usr/local/lib/python3.12/dist-packages/flashinfer/norm.py", line 82, in _rmsnorm
    get_norm_module().rmsnorm(out, input, weight, eps, get_cuda_stream(device))
  File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1123, in __call__
    return self._op(*args, **(kwargs or {}))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/utils/_device.py", line 104, in __torch_function__
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1123, in __call__
    return self._op(*args, **(kwargs or {}))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: RMSNorm failed with error code no kernel image is available for execution on the device

CUDA: 12.8
Pytorch: 2.6.0+cu128
GPU: H20

yzh119 · 2025-03-08T05:04:27Z

Seems H20's cuda architecture is not recognized, can you specify the environment variable:

export TORCH_CUDA_ARCH_LIST=9.0

nanmi · 2025-03-10T02:09:30Z

Seems H20's cuda architecture is not recognized, can you specify the environment variable:
export TORCH_CUDA_ARCH_LIST=9.0

not work, I have tried setting the environment variable,and additionally tried both JIT and AOT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RMSNorm failed with error code no kernel image is available for execution on the device #920

RMSNorm failed with error code no kernel image is available for execution on the device #920

nanmi commented Mar 7, 2025 •

edited

Loading

yzh119 commented Mar 8, 2025

nanmi commented Mar 10, 2025

RMSNorm failed with error code no kernel image is available for execution on the device #920

RMSNorm failed with error code no kernel image is available for execution on the device #920

Comments

nanmi commented Mar 7, 2025 • edited Loading

yzh119 commented Mar 8, 2025

nanmi commented Mar 10, 2025

nanmi commented Mar 7, 2025 •

edited

Loading