[Bug] Unsupported reg_class_agnostic=False in Earlier Cascade-InternImage Stages (Causes Assertion & Shape Mismatch Errors)

### Checklist

- [x] I have searched related issues but cannot get the expected help.
- [x] 2. I have read the [FAQ documentation](https://github.com/open-mmlab/mmdeploy/tree/main/docs/en/faq.md) but cannot get the expected help.
- [x] 3. The bug has not been fixed in the latest version.

### Describe the bug

+ I am facing the problem when exporting Cascade-InternImage (with DCNv3) to ONNX using the TensorRT backend. 
+ In my config file, `reg_class_agnostic=True` for all three stages of the bbox_head, but the mmdeploy code seems to only support using bbox class regression for the last stage.
+ I am eagerly looking forward to a solution and would greatly appreciate any help or guidance you might be able to offer—thank you so much in advance!

my config file:
```python
roi_head=dict(
        bbox_head=[
            dict(
                type='ConvFCBBoxHead',
                num_shared_convs=4,
                num_shared_fcs=1,
                in_channels=256,
                conv_out_channels=256,
                fc_out_channels=1024,
                roi_feat_size=7,
                num_classes=80,
                bbox_coder=dict(
                    type='DeltaXYWHBBoxCoder',
                    target_means=[0., 0., 0., 0.],
                    target_stds=[0.1, 0.1, 0.2, 0.2]),
                reg_class_agnostic=False,
                reg_decoded_bbox=True,
                norm_cfg=dict(type='SyncBN', requires_grad=True),
                loss_cls=dict(
                    type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
                loss_bbox=dict(type='GIoULoss', loss_weight=10.0)),
            dict(
                type='ConvFCBBoxHead',
                num_shared_convs=4,
                num_shared_fcs=1,
                in_channels=256,
                conv_out_channels=256,
                fc_out_channels=1024,
                roi_feat_size=7,
                num_classes=80,
                bbox_coder=dict(
                    type='DeltaXYWHBBoxCoder',
                    target_means=[0., 0., 0., 0.],
                    target_stds=[0.05, 0.05, 0.1, 0.1]),
                reg_class_agnostic=False,
                reg_decoded_bbox=True,
                norm_cfg=dict(type='SyncBN', requires_grad=True),
                loss_cls=dict(
                    type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
                loss_bbox=dict(type='GIoULoss', loss_weight=10.0)),
            dict(
                type='ConvFCBBoxHead',
                num_shared_convs=4,
                num_shared_fcs=1,
                in_channels=256,
                conv_out_channels=256,
                fc_out_channels=1024,
                roi_feat_size=7,
                num_classes=80,
                bbox_coder=dict(
                    type='DeltaXYWHBBoxCoder',
                    target_means=[0., 0., 0., 0.],
                    target_stds=[0.033, 0.033, 0.067, 0.067]),
                reg_class_agnostic=False,
                reg_decoded_bbox=True,
                norm_cfg=dict(type='SyncBN', requires_grad=True),
                loss_cls=dict(
                    type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
                loss_bbox=dict(type='GIoULoss', loss_weight=10.0))
])
```

+ assertion error location: (line66 cascade_roi_head.py)

```python
    # Eliminate the batch dimension
    rois = rois.view(-1, rois_dims)
    ms_scores = []
    max_shape = batch_img_metas[0]['img_shape']
    for i in range(self.num_stages):
        bbox_results = self._bbox_forward(i, x, rois)

        cls_score = bbox_results['cls_score']
        bbox_pred = bbox_results['bbox_pred']
        # Recover the batch dimension
        rois = rois.reshape(batch_size, num_proposals_per_img, rois.size(-1))
        cls_score = cls_score.reshape(batch_size, num_proposals_per_img,
                                      cls_score.size(-1))
        bbox_pred = bbox_pred.reshape(batch_size, num_proposals_per_img, -1)
        ms_scores.append(cls_score)
        if i < self.num_stages - 1:
            assert self.bbox_head[i].reg_class_agnostic   # <----- ASSERTION ERROR
            new_rois = self.bbox_head[i].bbox_coder.decode(
                rois[..., 1:], bbox_pred, max_shape=max_shape)
            new_rois = get_box_tensor(new_rois)
            rois = new_rois.reshape(-1, new_rois.shape[-1])
            # Add dummy batch index
            rois = torch.cat([batch_index.flatten(0, 1), rois], dim=-1)
```

+ if i ignore the assertion above, i got mismatch error in delta_xywh_bbox_coder.py (line 108)
```python
    dxy = denorm_deltas[..., :2]
    dwh = denorm_deltas[..., 2:]

    # fix openvino on torch1.13
    xy1 = rois[..., :2].unsqueeze(2)    
    xy2 = rois[..., 2:].unsqueeze(2)
 ## rois.unsqueeze(2) has shape num_classes * 4 rather than 4 in dimension 3
## got shape mismatch error

    pxy = (xy1 + xy2) * 0.5
    pwh = xy2 - xy1
    dxy_wh = pwh * dxy
```



### Reproduction

python deploy.py export_config.py model_config.py model.pth

### Environment

```Shell
python3.8.10
Cuda V11.8.89
pytorch 1.14.0a0+410ce96
torchvision 0.15.0a0
torchscript 1.14.0a0+410ce96
mmcv 2.1.0
mmdeply 1.3.0+621159e
mmdet 3.3.0
```

### Error traceback

```Shell

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] Unsupported reg_class_agnostic=False in Earlier Cascade-InternImage Stages (Causes Assertion & Shape Mismatch Errors) #2923

Checklist

Describe the bug

Reproduction

Environment

Error traceback

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] Unsupported reg_class_agnostic=False in Earlier Cascade-InternImage Stages (Causes Assertion & Shape Mismatch Errors) #2923

Description

Checklist

Describe the bug

Reproduction

Environment

Error traceback

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions