You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[TensorRT-LLM][INFO] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 40794 (MiB)
[TensorRT-LLM][INFO] [MemUsageChange] Allocated 5.49 GB GPU memory for runtime buffers.
[TensorRT-LLM][INFO] [MemUsageChange] Allocated 6.54 GB GPU memory for decoder.
terminate called after throwing an instance of 'tensorrt_llm::common::TllmException'
what(): Cannot determine size of FP4 data type (/code/tensorrt_llm/cpp/include/tensorrt_llm/common/dataType.h:40)
1 0x7f973fe01d33 /root/.local/lib/python3.12/site-packages/tensorrt_llm/libs/libtensorrt_llm.so(+0xa1dd33) [0x7f973fe01d33]
enisaras
changed the title
Cannot run modelopt quantize nvfp4 model on TensorRT LLM
Cannot run modelopt quantized nvfp4 model on TensorRT LLM
Apr 27, 2025
enisaras
changed the title
Cannot run modelopt quantized nvfp4 model on TensorRT LLM
Cannot serve modelopt quantized nvfp4 model on TensorRT LLM
Apr 27, 2025
Describe the bug
After quantizing Llama-3.1-70B-Instruct model using modelopt hf_ptq script, running into an error:
Steps/Code to reproduce bug
trt-llm build
command:Step 3 fails with the following logs:
The engine starts up successfully and able to process inference requests.
System information
2025-04-27 18:48:08,275 - INFO - flashinfer.jit: Prebuilt kernels not found, using JIT backend
[TensorRT-LLM] TensorRT-LLM version: 0.20.0rc1
The text was updated successfully, but these errors were encountered: