Skip to content

[Bug] Unsupported reg_class_agnostic=False in Earlier Cascade-InternImage Stages (Causes Assertion & Shape Mismatch Errors) #2923

@thu-bear

Description

@thu-bear

Checklist

  • I have searched related issues but cannot get the expected help.
  • 2. I have read the FAQ documentation but cannot get the expected help.
  • 3. The bug has not been fixed in the latest version.

Describe the bug

  • I am facing the problem when exporting Cascade-InternImage (with DCNv3) to ONNX using the TensorRT backend.
  • In my config file, reg_class_agnostic=True for all three stages of the bbox_head, but the mmdeploy code seems to only support using bbox class regression for the last stage.
  • I am eagerly looking forward to a solution and would greatly appreciate any help or guidance you might be able to offer—thank you so much in advance!

my config file:

roi_head=dict(
        bbox_head=[
            dict(
                type='ConvFCBBoxHead',
                num_shared_convs=4,
                num_shared_fcs=1,
                in_channels=256,
                conv_out_channels=256,
                fc_out_channels=1024,
                roi_feat_size=7,
                num_classes=80,
                bbox_coder=dict(
                    type='DeltaXYWHBBoxCoder',
                    target_means=[0., 0., 0., 0.],
                    target_stds=[0.1, 0.1, 0.2, 0.2]),
                reg_class_agnostic=False,
                reg_decoded_bbox=True,
                norm_cfg=dict(type='SyncBN', requires_grad=True),
                loss_cls=dict(
                    type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
                loss_bbox=dict(type='GIoULoss', loss_weight=10.0)),
            dict(
                type='ConvFCBBoxHead',
                num_shared_convs=4,
                num_shared_fcs=1,
                in_channels=256,
                conv_out_channels=256,
                fc_out_channels=1024,
                roi_feat_size=7,
                num_classes=80,
                bbox_coder=dict(
                    type='DeltaXYWHBBoxCoder',
                    target_means=[0., 0., 0., 0.],
                    target_stds=[0.05, 0.05, 0.1, 0.1]),
                reg_class_agnostic=False,
                reg_decoded_bbox=True,
                norm_cfg=dict(type='SyncBN', requires_grad=True),
                loss_cls=dict(
                    type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
                loss_bbox=dict(type='GIoULoss', loss_weight=10.0)),
            dict(
                type='ConvFCBBoxHead',
                num_shared_convs=4,
                num_shared_fcs=1,
                in_channels=256,
                conv_out_channels=256,
                fc_out_channels=1024,
                roi_feat_size=7,
                num_classes=80,
                bbox_coder=dict(
                    type='DeltaXYWHBBoxCoder',
                    target_means=[0., 0., 0., 0.],
                    target_stds=[0.033, 0.033, 0.067, 0.067]),
                reg_class_agnostic=False,
                reg_decoded_bbox=True,
                norm_cfg=dict(type='SyncBN', requires_grad=True),
                loss_cls=dict(
                    type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
                loss_bbox=dict(type='GIoULoss', loss_weight=10.0))
])
  • assertion error location: (line66 cascade_roi_head.py)
    # Eliminate the batch dimension
    rois = rois.view(-1, rois_dims)
    ms_scores = []
    max_shape = batch_img_metas[0]['img_shape']
    for i in range(self.num_stages):
        bbox_results = self._bbox_forward(i, x, rois)

        cls_score = bbox_results['cls_score']
        bbox_pred = bbox_results['bbox_pred']
        # Recover the batch dimension
        rois = rois.reshape(batch_size, num_proposals_per_img, rois.size(-1))
        cls_score = cls_score.reshape(batch_size, num_proposals_per_img,
                                      cls_score.size(-1))
        bbox_pred = bbox_pred.reshape(batch_size, num_proposals_per_img, -1)
        ms_scores.append(cls_score)
        if i < self.num_stages - 1:
            assert self.bbox_head[i].reg_class_agnostic   # <----- ASSERTION ERROR
            new_rois = self.bbox_head[i].bbox_coder.decode(
                rois[..., 1:], bbox_pred, max_shape=max_shape)
            new_rois = get_box_tensor(new_rois)
            rois = new_rois.reshape(-1, new_rois.shape[-1])
            # Add dummy batch index
            rois = torch.cat([batch_index.flatten(0, 1), rois], dim=-1)
  • if i ignore the assertion above, i got mismatch error in delta_xywh_bbox_coder.py (line 108)
    dxy = denorm_deltas[..., :2]
    dwh = denorm_deltas[..., 2:]

    # fix openvino on torch1.13
    xy1 = rois[..., :2].unsqueeze(2)    
    xy2 = rois[..., 2:].unsqueeze(2)
 ## rois.unsqueeze(2) has shape num_classes * 4 rather than 4 in dimension 3
## got shape mismatch error

    pxy = (xy1 + xy2) * 0.5
    pwh = xy2 - xy1
    dxy_wh = pwh * dxy

Reproduction

python deploy.py export_config.py model_config.py model.pth

Environment

python3.8.10
Cuda V11.8.89
pytorch 1.14.0a0+410ce96
torchvision 0.15.0a0
torchscript 1.14.0a0+410ce96
mmcv 2.1.0
mmdeply 1.3.0+621159e
mmdet 3.3.0

Error traceback

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions