Description
Hello, I am having issues converting a model from the HuggingFace library (https://huggingface.co/facebook/detr-resnet-50). See the model attached below in the drive link. I understand this may be an issue relating to them, but the issue lies in the TensorRT conversion.
Pytorch inference and CUDA onnx inference work correctly. However as soon as I convert to tensorRT (with trtexec, generating a .plan file, or by using TensorrtExecutionProvider with onnxruntime), the outputs of the model are way off what they should be. I am able to successfully convert with both TensorRT 10.1 and 10.8 with no error messages. When converting with TensorRT 10.3, I get an error, which could be the key:
Finished parsing network model. Parse time: 0.129517
[05/28/2025-19:32:53] [I] Set shape of input tensor pixel_values for optimization profile 0 to: MIN=1x3x800x800 OPT=1x3x800x800 MAX=1x3x800x800
[05/28/2025-19:32:53] [I] Set shape of input tensor pixel_mask for optimization profile 0 to: MIN=1x800x800 OPT=1x800x800 MAX=1x800x800
[05/28/2025-19:32:53] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
bb.cpp:138: CHECK(op->parent() == this || op->parent() == nullptr) failed.
[05/28/2025-19:33:05] [E] Error[9]: Error Code: 9: Skipping tactic 0x0000000000000000 due to exception No Myelin Error exists
[05/28/2025-19:33:05] [E] Error[10]: IBuilder::buildSerializedNetwork: Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[onnx::MatMul_4888 + ONNXTRT_Broadcast_1034.../Sigmoid]}.)
[05/28/2025-19:33:05] [E] Engine could not be created from network
[05/28/2025-19:33:05] [E] Building engine failed
[05/28/2025-19:33:05] [E] Failed to create engine from model or file.
[05/28/2025-19:33:05] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v100300] # trtexec --onnx=testDETRmodelUntrained.onnx --saveEngine=detr.plan --optShapes=pixel_values:1x3x800x800,pixel_mask:1x800x800 --minShapes=pixel_values:1x3x800x800,pixel_mask:1x800x800 --maxShapes=pixel_values:1x3x800x800,pixel_mask:1x800x800
Any idea what could be causing the outputs of the TensorRT model to be so different?
Also, here is an example of the output difference:
TensorRT predicted bounding boxes: tensor([[0.4578, 0.9594, 0.0902, 0.0699],
[0.5317, 0.9207, 0.0231, 0.0412],
[0.4644, 0.9500, 0.1214, 0.0991],
[0.5375, 0.9171, 0.0209, 0.0445],
[0.4828, 0.9044, 0.0866, 0.1148]])
ONNXruntime CUDAExecutionProvider predicted bounding boxes: tensor([[0.8857, 0.6460, 0.0568, 0.0995],
[0.5260, 0.3688, 0.0165, 0.0299],
[0.4127, 0.4972, 0.0478, 0.1282],
[0.5393, 0.3159, 0.0198, 0.0282],
[0.4496, 0.3149, 0.0291, 0.0299]])
Environment
TensorRT Version: 10.1/10.3/10.8
GPU Type: RTX 4090
Nvidia Driver Version: 570
CUDA Version: 12.4
CUDNN Version: NA
Operating System + Version: Ubuntu 22.04
Python Version (if applicable): 3.10
TensorFlow Version (if applicable): NA
PyTorch Version (if applicable): 2.4 with cuda 12.4
Baremetal or Container (if container which image + tag): Testing with TRT 10.3 was done in the deepstream-7.1-multiarch container.
Relevant Files
https://drive.google.com/file/d/1ptHXDVfl5Elhaud4CbqcCXyV6V0rOvtp/view?usp=drive_link
Steps To Reproduce
Use the trtexec command below for conversion, and compare onnx model outputs. If necesary, I can provide simple onnx model inference code for comparing outputs.
Trtexec command:
trtexec --onnx=testDETRmodelUntrained.onnx --saveEngine=detr.plan --optShapes=pixel_values:1x3x800x800,pixel_mask:1x800x800 --minShapes=pixel_values:1x3x800x800,pixel_mask:1x800x800 --maxShapes=pixel_values:1x3x800x800,pixel_mask:1x800x800
Note, I posted this on Nvidia forums but have not received a helpful response in around a week, so wanted to post here as well.