You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[TensorRT-LLM] TensorRT-LLM version: 0.18.0.dev2025022500
[TensorRT-LLM][INFO] Refreshed the MPI local session
Model init total -- 387.22s
2025-03-07 11:03:44,669 - INFO - flashinfer.jit: Loading JIT ops: norm
2025-03-07 11:03:44,689 - INFO - flashinfer.jit: Finished loading JIT ops: norm
CUDA Error: no kernel image is available for execution on the device (209) /usr/local/lib/python3.12/dist-packages/flashinfer/data/include/flashinfer/norm.cuh: line 113 at functioncudaLaunchKernel((void*)kernel, nblks, nthrs, args, smem_size, stream)
[03/07/2025-11:03:44] [TRT-LLM] [E] Failed to initialize executor on rank 6: RMSNorm failed with error code no kernel image is available for execution on the device
[ERROR | TRT-LLM ]: [TRT-LLM] [E] Failed to initialize executor on rank 6: RMSNorm failed with error code no kernel image is available for execution on the device
[03/07/2025-11:03:44] [TRT-LLM] [E] Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/worker.py", line 581, in worker_main
worker: ExecutorBindingsWorker = worker_cls(
^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/worker.py", line 126, in __init__
self.engine = _create_engine()
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/worker.py", line 122, in _create_engine
return create_executor(executor_config=executor_config,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor_creator.py", line 106, in create_py_executor
kv_cache_max_tokens = estimate_max_kv_cache_tokens(model_engine,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/_util.py", line 118, in estimate_max_kv_cache_tokens
model_engine.forward(req, resource_manager)
File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/model_engine.py", line 990, in forward
return self._forward_step(inputs, gather_ids)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/contextlib.py", line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/model_engine.py", line 1030, in _forward_step
logits = self.model.forward(**inputs,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/models/modeling_utils.py", line 234, in forward
hidden_states = self.model(
^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1740, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/models/modeling_deepseekv3.py", line 379, in forward
hidden_states, residual = decoder_layer(position_ids=position_ids,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1740, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/models/modeling_deepseekv3.py", line 319, in forward
hidden_states = self.input_layernorm(hidden_states)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1740, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/modules/rms_norm.py", line 32, in forward
return flashinfer_rmsnorm(hidden_states, self.weight,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/_library/custom_ops.py", line 637, in __call__
return self._opoverload(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 723, in __call__
return self._op(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/_library/custom_ops.py", line 305, in backend_impl
result = self._backend_fns[device_type](*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/_compile.py", line 32, in inner
return disable_fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 738, in _fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/_library/custom_ops.py", line 337, in wrapped_fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/custom_op.py", line 237, in flashinfer_rmsnorm
return rmsnorm(input, weight, eps)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/flashinfer/norm.py", line 73, in rmsnorm
_rmsnorm(out, input, weight, eps)
File "/usr/local/lib/python3.12/dist-packages/flashinfer/norm.py", line 82, in _rmsnorm
get_norm_module().rmsnorm(out, input, weight, eps, get_cuda_stream(device))
File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1123, in __call__
return self._op(*args, **(kwargs or {}))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/utils/_device.py", line 104, in __torch_function__
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1123, in __call__
return self._op(*args, **(kwargs or {}))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: RMSNorm failed with error code no kernel image is available for execution on the device
CUDA: 12.8
Pytorch: 2.6.0+cu128
GPU: H20
The text was updated successfully, but these errors were encountered:
CUDA: 12.8
Pytorch: 2.6.0+cu128
GPU: H20
The text was updated successfully, but these errors were encountered: