-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Tensorrt optimization shows unexpected results #4405
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Obversiously, Case 4 Slightly better than Case2 Your workspace size can be use deafult or larger. |
@lix19937 regarding case 3 - how to make it int8 only? or any optimizations which make it faster than case 2? "Your workspace size can be use deafult or larger." how is larger better? i see the generated engine is around 700MB only and my workspace size is 14 GB.... here is my model info:
|
In case 3, you can use |
@lix19937 I run what you suggested - it looks like it gets same latency as 2. So it cannot be further improved with quantization? Here are the logs: also it looks like Loaded engine size: 760 MiB... |
@lix19937 any idea to the above? also, what could be the reason that inference with onnx model is faster than tensorrt with all default settings (except minShapes and maxShapes)?
if minShapes and maxShapes are not set it will default to 1x1? but onnx doesnt define a shape:
|
Hi,
i try to create tensorrt engine from an onnx model.
I tried a few things and here are the inference latencies. Why is 3. and 4. performing worse than 2?
trtexec runs:
Logs:
trt_fp16.txt
trt_fp32.txt
trt_fp16_optimization_5.txt.zip
trt_int8.txt.zip
Environment
Triton Inference Server Version: 25.02
TensorRT Version: 10.8.0.43 (i think thats the version which comes with Triton Inference Server Version 25.02 - see: https://docs.nvidia.com/deeplearning/triton-inference-server/release-notes/rel-25-02.html)
trtexec: v100800
NVIDIA GPU: Nvidia A10
NVIDIA Driver Version:
CUDA Version:
CUDNN Version:
Operating System:
Python Version (if applicable):
Tensorflow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if so, version):
Relevant Files
Model link:
Steps To Reproduce
Commands or scripts:
Have you tried the latest release?:
Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (
polygraphy run <model.onnx> --onnxrt
):cc @lix19937
The text was updated successfully, but these errors were encountered: