Skip to content

Commit 7dd5def

Browse files
authored
Create cvt_grounding_dino-en.md
1 parent 97788a5 commit 7dd5def

File tree

1 file changed

+30
-0
lines changed

1 file changed

+30
-0
lines changed

docs/cvt_grounding_dino-en.md

+30
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
# Grounding DINO to TensorRT Conversion
2+
3+
Given that many people are interested about how to convert Grounding DINO mentioned in our paper to TensorRT, here is a brief introduction to our previous conversion approach. Additionally, while organizing the TRT conversion, we discovered a minor issue with the previous Grounding-DINO-T conversion. The correct FP16 speed after proper conversion should be approximately 27 FPS.
4+
5+
## Converting PyTorch Model to ONNX Model
6+
The original Grounding DINO code requires slight modifications to be converted to an ONNX model. However, when converting the ONNX model to a TensorRT model, various errors may occur. To avoid errors during ONNX to TensorRT conversion, some additional changes must be made when converting to the ONNX model.
7+
8+
- Comment out the statements using checkpoints in the backbone.
9+
- Rewrite the NestedTensor in the code; avoid using the NestedTensor data structure. NestedTensor is mainly concentrated in the visual part. Use Tensor directly instead.
10+
- Rewrite the Joiner class in `backbone.py` as shown in the example below. The rewritten class should inherit from `nn.Module` instead of `nn.Sequential`. This might be the key to avoiding issues when converting the ONNX model to a TensorRT model. Some content in the `build_backbone` function can be moved to the rewritten Joiner class.
11+
- Treat the tokenizer as data preprocessing and place it outside the model; the output should be directly passed as input to the model's forward function.
12+
- The special handling in the `nested_tensor_from_tensor_list` function for ONNX conversion needs to be retained.
13+
- Make other necessary changes due to the above modifications.
14+
15+
```python
16+
class Joiner(nn.Module):
17+
def __init__(self):
18+
self.backbone = xxxx
19+
self.position_embedding = xxx
20+
21+
def forward(self):
22+
pass
23+
```
24+
25+
## Converting ONNX Model to TensorRT Model
26+
The ONNX model converted according to the above suggestions can be smoothly converted to a TensorRT model.
27+
28+
- It is recommended to use the latest version of TensorRT; it is indeed very fast.
29+
- Fixing the input dimensions can provide certain advantages. The speed tests for Grounding DINO in Omdet are based on fixed input dimensions.
30+
- F32 is almost lossless. When converting to FP16, there is a significant loss of precision, and some layers with substantial losses need extra handling. The speed tests for Grounding DINO in Omdet are based on FP16 models. FP32 is about 25-30% slower than FP16.

0 commit comments

Comments
 (0)