RTMPose TensorRT Performance Drop

### Checklist

- [x] I have searched related issues but cannot get the expected help.
- [x] 2. I have read the [FAQ documentation](https://github.com/open-mmlab/mmdeploy/tree/main/docs/en/faq.md) but cannot get the expected help.
- [x] 3. The bug has not been fixed in the latest version.

### Describe the bug

I trained an RTMPose-X (384×288, Halpe26) model on a custom dataset with 38 keypoints. After training, I converted the model to TensorRT, but the PCK score dropped drastically from 0.97 to 0.006. In contrast, the ONNX model performs correctly without this issue.

I also tested the default RTMPose-X configuration (26 keypoints with the default checkpoints) TensorRT works fine for both batch size = 1 (static) and batch size = 2



_base_ = ['./pose-detection_static.py', '../_base_/backends/tensorrt-fp16.py']


onnx_config = dict(
    input_names=['input'],
    output_names=['simcc_x', 'simcc_y'],  
    input_shape=[288, 384],
    optimize=True,
    dynamic_axes={
        'input': {
            0: 'batch',
        },
        'simcc_x': {
            0: 'batch'
        },
        'simcc_y': {
            0: 'batch'
        }
    }
)


backend_config = dict(
    type='tensorrt',
    common_config=dict(
        fp16_mode=True,
        max_workspace_size=1 << 30),
    model_inputs=[
        dict(
            input_shapes=dict(
                input=dict(
                    min_shape=[1, 3, 384, 288],
                    opt_shape=[1, 3, 384, 288],
                    max_shape=[1, 3, 384, 288])))
    ])

### Reproduction

both config file same results 

auto_scale_lr = dict(base_batch_size=1024)
backend_args = dict(backend='local')
base_lr = 0.004
custom_halpe26 = [
    (
        0,
        0,
    ),
    (
        1,
        1,
    ),
    (
        2,
        2,
    ),
    (
        3,
        3,
    ),
    (
        4,
        4,
    ),
    (
        5,
        5,
    ),
    (
        6,
        6,
    ),
    (
        7,
        7,
    ),
    (
        8,
        8,
    ),
    (
        9,
        9,
    ),
    (
        10,
        10,
    ),
    (
        11,
        11,
    ),
    (
        12,
        12,
    ),
    (
        13,
        13,
    ),
    (
        14,
        14,
    ),
    (
        15,
        15,
    ),
    (
        16,
        16,
    ),
    (
        17,
        17,
    ),
    (
        18,
        18,
    ),
    (
        19,
        19,
    ),
    (
        20,
        20,
    ),
    (
        21,
        21,
    ),
    (
        22,
        22,
    ),
    (
        23,
        23,
    ),
    (
        24,
        24,
    ),
    (
        25,
        25,
    ),
    (
        26,
        26,
    ),
    (
        27,
        27,
    ),
    (
        28,
        28,
    ),
    (
        29,
        29,
    ),
    (
        30,
        30,
    ),(
        31,
        31,
    ),
    (
        32,
        32,
    ),
    (
        33,
        33,
    ),
    (
        34,
        34,
    ),
    (
        35,
        35,
    ),
    (
        36,
        36,
    ),
    (
        37,
        37,
    ),


]
codec = dict(
    input_size=(
        288,
        384,
    ),
    normalize=False,
    sigma=(
        6.0,
        6.93,
    ),
    simcc_split_ratio=2.0,
    type='SimCCLabel',
    use_dark=False)

custom_hooks = [
    dict(
        ema_type='ExpMomentumEMA',
        momentum=0.0002,
        priority=49,
        type='EMAHook',
        update_buffers=True),
    dict(
        switch_epoch=680,
        switch_pipeline=[
            dict(backend_args=dict(backend='local'), type='LoadImage'),
            dict(type='GetBBoxCenterScale'),
            dict(direction='horizontal', type='RandomFlip'),
            dict(type='RandomHalfBody'),
            dict(
                rotate_factor=90,
                scale_factor=[
                    0.5,
                    1.5,
                ],
                shift_factor=0.0,
                type='RandomBBoxTransform'),
            dict(input_size=(
                288,
                384,
            ), type='TopdownAffine'),
            dict(
                transforms=[
                    dict(p=0.1, type='Blur'),
                    dict(p=0.1, type='MedianBlur'),
                    dict(
                        max_height=0.4,
                        max_holes=1,
                        max_width=0.4,
                        min_height=0.2,
                        min_holes=1,
                        min_width=0.2,
                        p=0.5,
                        type='CoarseDropout'),
                ],
                type='Albumentation'),
            dict(
                encoder=dict(
                    input_size=(
                        288,
                        384,
                    ),
                    normalize=False,
                    sigma=(
                        6.0,
                        6.93,
                    ),
                    simcc_split_ratio=2.0,
                    type='SimCCLabel',
                    use_dark=False),
                type='GenerateTarget',
                use_dataset_keypoint_weights=True),
            dict(type='PackPoseInputs'),
        ],
        type='mmdet.PipelineSwitchHook'),
]

data_mode = 'topdown'
data_root = '../mmpose/data/person_racket/'
dataset_coco = dict(
    data_root=data_root,
    metainfo=dict(from_file='../mmpose/configs/datasets/custom_player_racket.py'),
    data_mode=data_mode,
    ann_file='annotations/train.json',
    data_prefix=dict(img='images/'),
    pipeline=[
        dict(
            mapping=custom_halpe26,
            num_keypoints=38,
            type='KeypointConverter'),
    ],
    type='CocoDataset')

default_hooks = dict(
    badcase=dict(
        badcase_thr=5,
        enable=False,
        metric_type='loss',
        out_dir='badcase',
        type='BadCaseAnalysisHook'),
    checkpoint=dict(
        interval=10,
        max_keep_ckpts=1,
        rule='greater',
        save_best='PCK',
        type='CheckpointHook'),
    logger=dict(interval=50, type='LoggerHook'),
    param_scheduler=dict(type='ParamSchedulerHook'),
    sampler_seed=dict(type='DistSamplerSeedHook'),
    timer=dict(type='IterTimerHook'),
    visualization=dict(enable=False, type='PoseVisualizationHook'))
default_scope = 'mmpose'
env_cfg = dict(
    cudnn_benchmark=False,
    dist_cfg=dict(backend='nccl'),
    mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0))

input_size = (
    288,
    384,
)


load_from = None
log_level = 'INFO'
log_processor = dict(
    by_epoch=True, num_digits=6, type='LogProcessor', window_size=50)
max_epochs = 700
model = dict(
    backbone=dict(
        _scope_='mmdet',
        act_cfg=dict(type='SiLU'),
        arch='P5',
        channel_attention=True,
        deepen_factor=1.33,
        expand_ratio=0.5,
        init_cfg=dict(
            checkpoint=
            'https://download.openmmlab.com/mmpose/v1/projects/rtmposev1/cspnext-x_udp-body7_210e-384x288-d28b58e6_20230529.pth',
            prefix='backbone.',
            type='Pretrained'),
        norm_cfg=dict(type='SyncBN'),
        out_indices=(4, ),
        type='CSPNeXt',
        widen_factor=1.25),
    data_preprocessor=dict(
        bgr_to_rgb=True,
        mean=[
            123.675,
            116.28,
            103.53,
        ],
        std=[
            58.395,
            57.12,
            57.375,
        ],
        type='PoseDataPreprocessor'),
    head=dict(
        decoder=dict(
            input_size=(
                288,
                384,
            ),
            normalize=False,
            sigma=(
                6.0,
                6.93,
            ),
            simcc_split_ratio=2.0,
            type='SimCCLabel',
            use_dark=False),
        final_layer_kernel_size=7,
        gau_cfg=dict(
            act_fn='SiLU',
            drop_path=0.0,
            dropout_rate=0.0,
            expansion_factor=2,
            hidden_dims=256,
            pos_enc=False,
            s=128,
            use_rel_bias=False),
        in_channels=1280,
        in_featuremap_size=(
            9,
            12,
        ),
        input_size=(
            288,
            384,
        ),
        loss=dict(
            beta=10.0,
            label_softmax=True,
            type='KLDiscretLoss',
            use_target_weight=True),
        out_channels=38,
        simcc_split_ratio=2.0,
        type='RTMCCHead'),
    test_cfg=dict(flip_test=True),
    type='TopdownPoseEstimator')

num_keypoints = 38

optim_wrapper = dict(
    clip_grad=dict(max_norm=35, norm_type=2),
    optimizer=dict(lr=0.004, type='AdamW', weight_decay=0.05),
    paramwise_cfg=dict(
        bias_decay_mult=0, bypass_duplicate=True, norm_decay_mult=0),
    type='OptimWrapper')
param_scheduler = [
    dict(
        begin=0, by_epoch=False, end=1000, start_factor=1e-05,
        type='LinearLR'),
    dict(
        T_max=350,
        begin=350,
        by_epoch=True,
        convert_to_iter_based=True,
        end=700,
        eta_min=0.0002,
        type='CosineAnnealingLR'),
]



randomness = dict(seed=21)
resume = False
stage2_num_epochs = 20
test_cfg = dict()
test_dataloader = dict(
    batch_size=64,
    dataset=dict(
        datasets=[
            dict(
                data_root='../mmpose/data/person_racket/',
                ann_file='annotations/val.json',
                data_mode='topdown',
                data_prefix=dict(img='images/'),
                pipeline=[
                    dict(
                        mapping=custom_halpe26,
                        num_keypoints=38,
                        type='KeypointConverter'),
                ],
                type='CocoDataset'),
            
            
        ],
        metainfo=dict(from_file='../mmpose/configs/datasets/custom_player_racket.py'),
        pipeline=[
            dict(backend_args=dict(backend='local'), type='LoadImage'),
            dict(type='GetBBoxCenterScale'),
            dict(input_size=(
                288,
                384,
            ), type='TopdownAffine'),
            dict(type='PackPoseInputs'),
        ],
        test_mode=True,
        type='CombinedDataset'),
    drop_last=False,
    num_workers=10,
    persistent_workers=True,
    sampler=dict(round_up=False, shuffle=False, type='DefaultSampler'))


train_batch_size = 16
train_cfg = dict(by_epoch=True, max_epochs=700, val_interval=1)
train_dataloader = dict(
    batch_size=16,
    dataset=dict(
        datasets=[
            dict(
                data_root='../mmpose/data/person_racket/',
                ann_file='annotations/train.json',
                data_mode='topdown',
                data_prefix=dict(img='images/'),
                pipeline=[
                    dict(
                        mapping=custom_halpe26,
                        num_keypoints=38,
                        type='KeypointConverter'),
                ],
                type='CocoDataset'),
            
        ],
        metainfo=dict(from_file='../mmpose/configs/datasets/custom_player_racket.py'),
        pipeline=[
            dict(backend_args=dict(backend='local'), type='LoadImage'),
            dict(type='GetBBoxCenterScale'),
            dict(direction='horizontal', type='RandomFlip'),
            dict(type='RandomHalfBody'),
            dict(
                rotate_factor=90,
                scale_factor=[
                    0.5,
                    1.5,
                ],
                type='RandomBBoxTransform'),
            dict(input_size=(
                288,
                384,
            ), type='TopdownAffine'),
            dict(type='PhotometricDistortion'),
            dict(
                transforms=[
                    dict(p=0.1, type='Blur'),
                    dict(p=0.1, type='MedianBlur'),
                    dict(
                        max_height=0.4,
                        max_holes=1,
                        max_width=0.4,
                        min_height=0.2,
                        min_holes=1,
                        min_width=0.2,
                        p=1.0,
                        type='CoarseDropout'),
                ],
                type='Albumentation'),
            dict(
                encoder=dict(
                    input_size=(
                        288,
                        384,
                    ),
                    normalize=False,
                    sigma=(
                        6.0,
                        6.93,
                    ),
                    simcc_split_ratio=2.0,
                    type='SimCCLabel',
                    use_dark=False),
                type='GenerateTarget',
                use_dataset_keypoint_weights=True),
            dict(type='PackPoseInputs'),
        ],
        test_mode=False,
        type='CombinedDataset'),
    num_workers=10,
    persistent_workers=True,
    pin_memory=True,
    sampler=dict(shuffle=True, type='DefaultSampler'))

train_pipeline = [
    dict(backend_args=dict(backend='local'), type='LoadImage'),
    dict(type='GetBBoxCenterScale'),
    dict(direction='horizontal', type='RandomFlip'),
    dict(type='RandomHalfBody'),
    dict(
        rotate_factor=90,
        scale_factor=[
            0.5,
            1.5,
        ],
        type='RandomBBoxTransform'),
    dict(input_size=(
        288,
        384,
    ), type='TopdownAffine'),
    dict(type='PhotometricDistortion'),
    dict(
        transforms=[
            dict(p=0.1, type='Blur'),
            dict(p=0.1, type='MedianBlur'),
            dict(
                max_height=0.4,
                max_holes=1,
                max_width=0.4,
                min_height=0.2,
                min_holes=1,
                min_width=0.2,
                p=1.0,
                type='CoarseDropout'),
        ],
        type='Albumentation'),
    dict(
        encoder=dict(
            input_size=(
                288,
                384,
            ),
            normalize=False,
            sigma=(
                6.0,
                6.93,
            ),
            simcc_split_ratio=2.0,
            type='SimCCLabel',
            use_dark=False),
        type='GenerateTarget',
        use_dataset_keypoint_weights=True),
    dict(type='PackPoseInputs'),
]
train_pipeline_stage2 = [
    dict(backend_args=dict(backend='local'), type='LoadImage'),
    dict(type='GetBBoxCenterScale'),
    dict(direction='horizontal', type='RandomFlip'),
    dict(type='RandomHalfBody'),
    dict(
        rotate_factor=90,
        scale_factor=[
            0.5,
            1.5,
        ],
        shift_factor=0.0,
        type='RandomBBoxTransform'),
    dict(input_size=(
        288,
        384,
    ), type='TopdownAffine'),
    dict(
        transforms=[
            dict(p=0.1, type='Blur'),
            dict(p=0.1, type='MedianBlur'),
            dict(
                max_height=0.4,
                max_holes=1,
                max_width=0.4,
                min_height=0.2,
                min_holes=1,
                min_width=0.2,
                p=0.5,
                type='CoarseDropout'),
        ],
        type='Albumentation'),
    dict(
        encoder=dict(
            input_size=(
                288,
                384,
            ),
            normalize=False,
            sigma=(
                6.0,
                6.93,
            ),
            simcc_split_ratio=2.0,
            type='SimCCLabel',
            use_dark=False),
        type='GenerateTarget',
        use_dataset_keypoint_weights=True),
    dict(type='PackPoseInputs'),
]


val_batch_size = 16
val_cfg = dict()
val_coco = dict(
    data_root='../mmpose/data/person_racket/',
    data_mode='topdown',
    ann_file='annotations/val.json',
    data_prefix=dict(img='images/'),
    pipeline=[
        dict(
            mapping=custom_halpe26,
            num_keypoints=38,
            type='KeypointConverter'),
    ],
    type='CocoDataset')


val_dataloader = dict(
    batch_size=16,
    dataset=dict(
        datasets=[
            dict(
                data_root='../mmpose/data/person_racket/',
                data_mode='topdown',
                ann_file='annotations/val.json',
                data_prefix=dict(img='images/'),
                pipeline=[
                    dict(
                        mapping=custom_halpe26,
                        num_keypoints=38,
                        type='KeypointConverter'),
                ],
                type='CocoDataset'),
            
        ],
        metainfo=dict(from_file='../mmpose/configs/datasets/custom_player_racket.py'),
        pipeline=[
            dict(backend_args=dict(backend='local'), type='LoadImage'),
            dict(type='GetBBoxCenterScale'),
            dict(input_size=(
                288,
                384,
            ), type='TopdownAffine'),
            dict(type='PackPoseInputs'),
        ],
        test_mode=True,
        type='CombinedDataset'),
    drop_last=False,
    num_workers=10,
    persistent_workers=True,
    sampler=dict(round_up=False, shuffle=False, type='DefaultSampler'))


val_evaluator = [
    dict(thr=0.1, type='PCKAccuracy'),
    dict(type='AUC'),
]

val_pipeline = [
    dict(backend_args=dict(backend='local'), type='LoadImage'),
    dict(type='GetBBoxCenterScale'),
    dict(input_size=(
        288,
        384,
    ), type='TopdownAffine'),
    dict(type='PackPoseInputs'),
]

vis_backends = [
    dict(type='LocalVisBackend'),
]
visualizer = dict(
    name='visualizer',
    type='PoseLocalVisualizer',
    vis_backends=[
        dict(type='LocalVisBackend'),
    ])
test_evaluator=val_evaluator


/////

_base_ = ['mmpose::_base_/default_runtime.py']

# common setting
num_keypoints = 38
input_size = (288, 384)

# runtime
max_epochs = 300
stage2_num_epochs = 20
base_lr = 4e-3
train_batch_size = 16
val_batch_size = 16

train_cfg = dict(max_epochs=max_epochs, val_interval=1)
randomness = dict(seed=21)

# optimizer
optim_wrapper = dict(
    type='OptimWrapper',
    optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.05),
    clip_grad=dict(max_norm=35, norm_type=2),
    paramwise_cfg=dict(
        norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True))

# learning rate
param_scheduler = [
    dict(
        type='LinearLR',
        start_factor=1.0e-5,
        by_epoch=False,
        begin=0,
        end=1000),
    dict(
        type='CosineAnnealingLR',
        eta_min=base_lr * 0.05,
        begin=max_epochs // 2,
        end=max_epochs,
        T_max=max_epochs // 2,
        by_epoch=True,
        convert_to_iter_based=True),
]

# automatically scaling LR based on the actual training batch size
auto_scale_lr = dict(base_batch_size=1024)

# codec settings
codec = dict(
    type='SimCCLabel',
    input_size=input_size,
    sigma=(6., 6.93),
    simcc_split_ratio=2.0,
    normalize=False,
    use_dark=False)

# model settings
model = dict(
    type='TopdownPoseEstimator',
    data_preprocessor=dict(
        type='PoseDataPreprocessor',
        mean=[123.675, 116.28, 103.53],
        std=[58.395, 57.12, 57.375],
        bgr_to_rgb=True),
    backbone=dict(
        _scope_='mmdet',
        type='CSPNeXt',
        arch='P5',
        expand_ratio=0.5,
        deepen_factor=1.33,
        widen_factor=1.25,
        out_indices=(4, ),
        channel_attention=True,
        norm_cfg=dict(type='SyncBN'),
        act_cfg=dict(type='SiLU'),
        init_cfg=None),
    head=dict(
        type='RTMCCHead',
        in_channels=1280,
        out_channels=num_keypoints,
        input_size=input_size,
        in_featuremap_size=tuple([s // 32 for s in input_size]),
        simcc_split_ratio=codec['simcc_split_ratio'],
        final_layer_kernel_size=7,
        gau_cfg=dict(
            hidden_dims=256,
            s=128,
            expansion_factor=2,
            dropout_rate=0.,
            drop_path=0.,
            act_fn='SiLU',
            use_rel_bias=False,
            pos_enc=False),
        loss=dict(
            type='KLDiscretLoss',
            use_target_weight=True,
            beta=10.,
            label_softmax=True),
        decoder=codec),
    test_cfg=dict(flip_test=True))

# base dataset settings
dataset_type = 'CocoDataset'
data_mode = 'topdown'
data_root = '../mmpose/data/person_racket/' 

backend_args = dict(backend='local')

# pipelines
train_pipeline = [
    dict(type='LoadImage', backend_args=backend_args),
    dict(type='GetBBoxCenterScale'),
    dict(type='RandomFlip', direction='horizontal'),
    dict(type='RandomHalfBody'),
    dict(
        type='RandomBBoxTransform', scale_factor=[0.5, 1.5], rotate_factor=90),
    dict(type='TopdownAffine', input_size=codec['input_size']),
    dict(type='PhotometricDistortion'),
    dict(
        type='Albumentation',
        transforms=[
            dict(type='Blur', p=0.1),
            dict(type='MedianBlur', p=0.1),
            dict(
                type='CoarseDropout',
                max_holes=1,
                max_height=0.4,
                max_width=0.4,
                min_holes=1,
                min_height=0.2,
                min_width=0.2,
                p=1.0),
        ]),
    dict(
        type='GenerateTarget',
        encoder=codec,
        use_dataset_keypoint_weights=True),
    dict(type='PackPoseInputs')
]
val_pipeline = [
    dict(type='LoadImage', backend_args=backend_args),
    dict(type='GetBBoxCenterScale'),
    dict(type='TopdownAffine', input_size=codec['input_size']),
    dict(type='PackPoseInputs')
]

train_pipeline_stage2 = [
    dict(type='LoadImage', backend_args=backend_args),
    dict(type='GetBBoxCenterScale'),
    dict(type='RandomFlip', direction='horizontal'),
    dict(type='RandomHalfBody'),
    dict(
        type='RandomBBoxTransform',
        shift_factor=0.,
        scale_factor=[0.5, 1.5],
        rotate_factor=90),
    dict(type='TopdownAffine', input_size=codec['input_size']),
    dict(
        type='Albumentation',
        transforms=[
            dict(type='Blur', p=0.1),
            dict(type='MedianBlur', p=0.1),
            dict(
                type='CoarseDropout',
                max_holes=1,
                max_height=0.4,
                max_width=0.4,
                min_holes=1,
                min_height=0.2,
                min_width=0.2,
                p=0.5),
        ]),
    dict(
        type='GenerateTarget',
        encoder=codec,
        use_dataset_keypoint_weights=True),
    dict(type='PackPoseInputs')
]

#halpe26_to_custom38 = [(i, i) for i in range(26)]


# data loaders
train_dataloader = dict(
    batch_size=train_batch_size,
    num_workers=10,
    pin_memory=True,
    persistent_workers=True,
    sampler=dict(type='DefaultSampler', shuffle=True),
        dataset=dict(
        type=dataset_type,
        data_root=data_root,
        metainfo=dict(from_file='../mmpose/configs/datasets/custom_player_racket.py'),
        data_mode=data_mode,
        ann_file='annotations/train.json',
        data_prefix=dict(img='images/'),
        pipeline=train_pipeline,
        test_mode=False,
    ))



val_dataloader = dict(
    batch_size=val_batch_size,
    num_workers=10,
    persistent_workers=True,
    drop_last=False,
    sampler=dict(type='DefaultSampler', shuffle=False, round_up=False),
   dataset=dict(
        type=dataset_type,
        data_root=data_root,
        metainfo=dict(from_file='../mmpose/configs/datasets/custom_player_racket.py'),
        data_mode=data_mode,
        ann_file='annotations/val.json',
        data_prefix=dict(img='images/'),
        pipeline=val_pipeline,
        test_mode=True,
    ))

test_dataloader = val_dataloader

# hooks
default_hooks = dict(
    checkpoint=dict(save_best='PCK', rule='greater', max_keep_ckpts=1))

custom_hooks = [
    dict(
        type='EMAHook',
        ema_type='ExpMomentumEMA',
        momentum=0.0002,
        update_buffers=True,
        priority=49),
    dict(
        type='mmdet.PipelineSwitchHook',
        switch_epoch=max_epochs - stage2_num_epochs,
        switch_pipeline=train_pipeline_stage2)
]

# evaluators
test_evaluator = [dict(type='PCKAccuracy', thr=0.1), dict(type='AUC')]
val_evaluator = test_evaluator


visualizer = dict(
    name='visualizer',
    type='PoseLocalVisualizer', 
    vis_backends=[
        dict(type='LocalVisBackend'),
        dict(type='TensorboardVisBackend'),
    ]
)



### Environment

```Shell
I tested both Docker "openmmlab/mmdeploy:ubuntu20.04-cuda11.8-mmdeploy1.3.1" and venv "absl-py==2.3.1
addict==2.4.0
aenum==3.1.16
albucore==0.0.17
albumentations==1.4.18
aliyun-python-sdk-core==2.16.0
aliyun-python-sdk-kms==2.16.5
annotated-types==0.7.0
attrs==25.3.0
Brotli==1.1.0
cachetools==5.5.2
certifi==2025.7.14
cffi==1.17.1
charset-normalizer==3.4.2
chumpy==0.70
click==8.1.8
colorama==0.4.6
coloredlogs==15.0.1
contourpy==1.1.1
coverage==7.6.1
crcmod==1.7
cryptography==45.0.4
cycler==0.12.1
Cython==3.1.2
dill==0.4.0
eval_type_backport==0.2.2
exceptiongroup==1.3.0
filelock==3.14.0
flake8==7.1.2
flatbuffers==25.2.10
fonttools==4.57.0
fsspec==2025.3.0
future==1.0.0
gmpy2==2.2.1
google-auth==2.40.3
google-auth-oauthlib==1.0.0
grpcio==1.70.0
h2==4.1.0
hpack==4.0.0
humanfriendly==10.0
hyperframe==6.0.1
idna==3.10
imageio==2.35.1
importlib_metadata==8.5.0
importlib_resources==6.4.5
iniconfig==2.1.0
interrogate==1.7.0
isort==4.3.21
Jinja2==3.1.3
jmespath==0.10.0
json-tricks==3.17.3
kiwisolver==1.4.7
lazy_loader==0.4
Mako==1.3.10
Markdown==3.7
markdown-it-py==3.0.0
MarkupSafe==2.1.5
matplotlib==3.7.5
mccabe==0.7.0
mdurl==0.1.2
mmcv==2.1.0
mmdeploy==1.3.1
mmdeploy-runtime-gpu==1.3.1
mmdet==3.3.0
mmengine==0.10.4
mmpose==1.2.0
model-index==0.1.11
mpmath==1.3.0
multiprocess==0.70.18
munkres==1.1.4
myutils==0.0.21
networkx==3.0
numpy==1.24.4
nvidia-cublas-cu11==11.10.3.66
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu11==11.7.99
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu11==11.7.99
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu11==8.5.0.96
nvidia-cudnn-cu12==9.1.0.70
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.20.5
nvidia-nvjitlink-cu12==12.1.105
nvidia-nvtx-cu12==12.1.105
oauthlib==3.3.1
onnx==1.17.0
onnxruntime-gpu==1.15.1
opencv-python==4.11.0.86
opencv-python-headless==4.12.0.88
opendatalab==0.0.10
openmim==0.3.9
openxlab==0.1.2
ordered-set==4.1.0
oss2==2.17.0
packaging==24.2
pandas==2.0.3
parameterized==0.9.0
pillow==10.2.0
platformdirs==4.3.6
pluggy==1.5.0
prettytable==3.11.0
protobuf==3.20.2
py==1.11.0
pyasn1==0.6.1
pyasn1_modules==0.4.2
pycocotools==2.0.7
pycodestyle==2.12.1
pycparser==2.22
pycryptodome==3.23.0
pycuda==2025.1.1
pydantic==2.10.6
pydantic_core==2.27.2
pyflakes==3.2.0
Pygments==2.19.1
pyparsing==3.1.4
PySocks==1.7.1
pytest==8.3.5
pytest-runner==6.0.1
python-dateutil==2.9.0.post0
pytools==2024.1.14
pytz==2023.4
PyWavelets==1.4.1
PyYAML==6.0.2
requests==2.28.2
requests-oauthlib==2.0.0
rich==13.4.2
rsa==4.9.1
scikit-image==0.21.0
scipy==1.10.1
shapely==2.0.7
six==1.17.0
sympy==1.13.3
tabulate==0.9.0
tensorboard==2.14.0
tensorboard-data-server==0.7.2
tensorrt==10.12.0.36
tensorrt-cu12==10.12.0.36
tensorrt-dispatch-cu12==10.12.0.36
tensorrt-lean-cu12==10.12.0.36
tensorrt_cu12_bindings==10.12.0.36
tensorrt_cu12_libs==10.12.0.36
tensorrt_dispatch_cu12_bindings==10.12.0.36
tensorrt_dispatch_cu12_libs==10.12.0.36
tensorrt_lean_cu12_bindings==10.12.0.36
tensorrt_lean_cu12_libs==10.12.0.36
termcolor==2.4.0
terminaltables==3.1.10
tifffile==2023.7.10
tomli==2.2.1
torch==2.4.1
torchvision==0.19.1
tqdm==4.65.2
triton==3.0.0
typing_extensions==4.13.2
tzdata==2025.2
urllib3==1.26.20
wcwidth==0.2.13
Werkzeug==3.0.6
xdoctest==1.2.0
xtcocotools==1.14.3
yapf==0.43.0
zipp==3.20.2
zstandard==0.23.0
"
```

### Error traceback

```Shell
python3 tools/deploy.py   configs/mmpose/pose-detection_tensorrt-fp16_static-384x288.py   ../test_config_model.py   work_dirs/rtmpose_x_288x384/best_PCK_epoch_200.pth   test_img2.jpg   --work-dir mmdeploy_models/mmpose/trt   --device cuda   --dump-info   --show
08/21 09:08:53 - mmengine - WARNING - Failed to search registry with scope "mmpose" in the "Codebases" registry tree. As a workaround, the current "Codebases" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmpose" is a correct scope, or whether the registry is initialized.
08/21 09:08:53 - mmengine - WARNING - Failed to search registry with scope "mmpose" in the "mmpose_tasks" registry tree. As a workaround, the current "mmpose_tasks" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmpose" is a correct scope, or whether the registry is initialized.
08/21 09:08:55 - mmengine - INFO - Start pipeline mmdeploy.apis.pytorch2onnx.torch2onnx in subprocess
08/21 09:08:56 - mmengine - WARNING - Failed to search registry with scope "mmpose" in the "Codebases" registry tree. As a workaround, the current "Codebases" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmpose" is a correct scope, or whether the registry is initialized.
08/21 09:08:56 - mmengine - WARNING - Failed to search registry with scope "mmpose" in the "mmpose_tasks" registry tree. As a workaround, the current "mmpose_tasks" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmpose" is a correct scope, or whether the registry is initialized.
Loads checkpoint by local backend from path: work_dirs/rtmpose_x_288x384/best_PCK_epoch_200.pth
08/21 09:08:57 - mmengine - WARNING - DeprecationWarning: get_onnx_config will be deprecated in the future. 
08/21 09:08:57 - mmengine - INFO - Export PyTorch model to ONNX: mmdeploy_models/mmpose/trt/end2end.onnx.
08/21 09:08:57 - mmengine - WARNING - Can not find torch.nn.functional._scaled_dot_product_attention, function rewrite will not be applied
08/21 09:08:57 - mmengine - WARNING - Can not find mmdet.models.utils.transformer.PatchMerging.forward, function rewrite will not be applied
08/21 09:09:01 - mmengine - INFO - Execute onnx optimize passes.
============= Diagnostic Run torch.onnx.export version 2.0.0+cu118 =============
verbose: False, log level: Level.ERROR
======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================

08/21 09:09:02 - mmengine - INFO - Finish pipeline mmdeploy.apis.pytorch2onnx.torch2onnx
08/21 09:09:05 - mmengine - INFO - Start pipeline mmdeploy.apis.utils.utils.to_backend in subprocess
08/21 09:09:05 - mmengine - WARNING - Could not load the library of tensorrt plugins.             Because the file does not exist: 
[08/21/2025-09:09:05] [TRT] [I] [MemUsageChange] Init CUDA: CPU +18, GPU +0, now: CPU 116, GPU 148 (MiB)
[08/21/2025-09:09:11] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1460, GPU +266, now: CPU 1652, GPU 414 (MiB)
[08/21/2025-09:09:11] [TRT] [I] ----------------------------------------------------------------
[08/21/2025-09:09:11] [TRT] [I] Input filename:   mmdeploy_models/mmpose/trt/end2end.onnx
[08/21/2025-09:09:11] [TRT] [I] ONNX IR version:  0.0.6
[08/21/2025-09:09:11] [TRT] [I] Opset version:    11
[08/21/2025-09:09:11] [TRT] [I] Producer name:    pytorch
[08/21/2025-09:09:11] [TRT] [I] Producer version: 2.0.0
[08/21/2025-09:09:11] [TRT] [I] Domain:           
[08/21/2025-09:09:11] [TRT] [I] Model version:    0
[08/21/2025-09:09:11] [TRT] [I] Doc string:       
[08/21/2025-09:09:11] [TRT] [I] ----------------------------------------------------------------
[08/21/2025-09:09:12] [TRT] [W] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[08/21/2025-09:09:12] [TRT] [I] Graph optimization time: 0.0681861 seconds.
[08/21/2025-09:09:12] [TRT] [I] Local timing cache in use. Profiling results in this builder pass will not be stored.
[08/21/2025-09:16:34] [TRT] [I] Detected 1 inputs and 2 output network tensors.
[08/21/2025-09:16:35] [TRT] [I] Total Host Persistent Memory: 567712
[08/21/2025-09:16:35] [TRT] [I] Total Device Persistent Memory: 17408
[08/21/2025-09:16:35] [TRT] [I] Total Scratch Memory: 4608
[08/21/2025-09:16:35] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 111 MiB, GPU 100 MiB
[08/21/2025-09:16:35] [TRT] [I] [BlockAssignment] Started assigning block shifts. This will take 289 steps to complete.
[08/21/2025-09:16:35] [TRT] [I] [BlockAssignment] Algorithm ShiftNTopDown took 23.9186ms to assign 8 blocks to 289 nodes requiring 9134592 bytes.
[08/21/2025-09:16:35] [TRT] [I] Total Activation Memory: 9133568
[08/21/2025-09:16:35] [TRT] [W] TensorRT encountered issues when converting weights between types and that could affect accuracy.
[08/21/2025-09:16:35] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
[08/21/2025-09:16:35] [TRT] [W] Check verbose logs for the list of affected weights.
[08/21/2025-09:16:35] [TRT] [W] - 100 weights are affected by this issue: Detected subnormal FP16 values.
[08/21/2025-09:16:35] [TRT] [W] - 69 weights are affected by this issue: Detected values less than smallest positive FP16 subnormal value and converted them to the FP16 minimum subnormalized value.
[08/21/2025-09:16:35] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +97, GPU +100, now: CPU 97, GPU 100 (MiB)
08/21 09:16:36 - mmengine - INFO - Finish pipeline mmdeploy.apis.utils.utils.to_backend
08/21 09:16:37 - mmengine - INFO - visualize tensorrt model start.
08/21 09:16:40 - mmengine - WARNING - Failed to search registry with scope "mmpose" in the "Codebases" registry tree. As a workaround, the current "Codebases" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmpose" is a correct scope, or whether the registry is initialized.
08/21 09:16:40 - mmengine - WARNING - Failed to search registry with scope "mmpose" in the "mmpose_tasks" registry tree. As a workaround, the current "mmpose_tasks" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmpose" is a correct scope, or whether the registry is initialized.
08/21 09:16:40 - mmengine - WARNING - Failed to search registry with scope "mmpose" in the "backend_segmentors" registry tree. As a workaround, the current "backend_segmentors" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmpose" is a correct scope, or whether the registry is initialized.
08/21 09:16:40 - mmengine - WARNING - Could not load the library of tensorrt plugins.             Because the file does not exist: 
08/21 09:16:41 - mmengine - WARNING - render and display result skipped for headless device, exception No module named 'tkinter'
08/21 09:16:42 - mmengine - INFO - visualize tensorrt model success.
08/21 09:16:42 - mmengine - INFO - visualize pytorch model start.
08/21 09:16:45 - mmengine - WARNING - Failed to search registry with scope "mmpose" in the "Codebases" registry tree. As a workaround, the current "Codebases" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmpose" is a correct scope, or whether the registry is initialized.
08/21 09:16:45 - mmengine - WARNING - Failed to search registry with scope "mmpose" in the "mmpose_tasks" registry tree. As a workaround, the current "mmpose_tasks" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmpose" is a correct scope, or whether the registry is initialized.
Loads checkpoint by local backend from path: work_dirs/rtmpose_x_288x384/best_PCK_epoch_200.pth
08/21 09:16:47 - mmengine - WARNING - render and display result skipped for headless device, exception No module named 'tkinter'
08/21 09:16:48 - mmengine - INFO - visualize pytorch model success.
08/21 09:16:48 - mmengine - INFO - All process success.
root@f2b64220fba9:/RnD.Pose_CVAT/mmdeploy# python3 tools/test.py configs/mmpose/pose-detection_tensorrt-fp16_static-384x288.py ../test_config_model.py --model mmdeploy_models/mmpose/trt/end2end.engine --device cuda
08/21 09:18:01 - mmengine - WARNING - Failed to search registry with scope "mmpose" in the "Codebases" registry tree. As a workaround, the current "Codebases" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmpose" is a correct scope, or whether the registry is initialized.
08/21 09:18:01 - mmengine - WARNING - Failed to search registry with scope "mmpose" in the "mmpose_tasks" registry tree. As a workaround, the current "mmpose_tasks" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmpose" is a correct scope, or whether the registry is initialized.
/usr/local/lib/python3.8/dist-packages/mmpose/datasets/datasets/utils.py:102: UserWarning: The metainfo config file "configs/_base_/datasets/coco.py" does not exist. A matched config file "/usr/local/lib/python3.8/dist-packages/mmpose/.mim/configs/_base_/datasets/coco.py" will be used instead.
  warnings.warn(
loading annotations into memory...
Done (t=0.04s)
creating index...
index created!
08/21 09:18:01 - mmengine - WARNING - Failed to search registry with scope "mmpose" in the "backend_segmentors" registry tree. As a workaround, the current "backend_segmentors" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmpose" is a correct scope, or whether the registry is initialized.
08/21 09:18:01 - mmengine - WARNING - Could not load the library of tensorrt plugins.             Because the file does not exist: 
08/21 09:18:02 - mmengine - INFO - 
------------------------------------------------------------
System environment:
    sys.platform: linux
    Python: 3.8.10 (default, May 26 2023, 14:05:08) [GCC 9.4.0]
    CUDA available: True
    MUSA available: False
    numpy_random_seed: 1810311327
    GPU 0: NVIDIA GeForce RTX 4070 Laptop GPU
    CUDA_HOME: /usr/local/cuda
    NVCC: Cuda compilation tools, release 11.8, V11.8.89
    GCC: x86_64-linux-gnu-gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
    PyTorch: 2.0.0+cu118
    PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201703
  - Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.7.3 (Git Hash 6dbeffbae1f23cbbeae17adb7b5b13f1f37c080e)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.8
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90
  - CuDNN 8.7
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.8, CUDNN_VERSION=8.7.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.0.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 

    TorchVision: 0.15.0+cu118
    OpenCV: 4.5.4
    MMEngine: 0.10.3

Runtime environment:
    dist_cfg: {'backend': 'nccl'}
    seed: 1810311327
    Distributed launcher: none
    Distributed training: False
    GPU number: 1
------------------------------------------------------------

08/21 09:18:02 - mmengine - INFO - Hooks will be executed in the following order:
before_run:
(VERY_HIGH   ) RuntimeInfoHook                    
(BELOW_NORMAL) LoggerHook                         
 -------------------- 
before_train:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
(VERY_LOW    ) CheckpointHook                     
 -------------------- 
before_train_epoch:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
(NORMAL      ) DistSamplerSeedHook                
 -------------------- 
before_train_iter:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
 -------------------- 
after_train_iter:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
(BELOW_NORMAL) LoggerHook                         
(LOW         ) ParamSchedulerHook                 
(VERY_LOW    ) CheckpointHook                     
 -------------------- 
after_train_epoch:
(NORMAL      ) IterTimerHook                      
(LOW         ) ParamSchedulerHook                 
(VERY_LOW    ) CheckpointHook                     
 -------------------- 
before_val:
(VERY_HIGH   ) RuntimeInfoHook                    
 -------------------- 
before_val_epoch:
(NORMAL      ) IterTimerHook                      
 -------------------- 
before_val_iter:
(NORMAL      ) IterTimerHook                      
 -------------------- 
after_val_iter:
(NORMAL      ) IterTimerHook                      
(NORMAL      ) PoseVisualizationHook              
(BELOW_NORMAL) LoggerHook                         
 -------------------- 
after_val_epoch:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
(BELOW_NORMAL) LoggerHook                         
(LOW         ) ParamSchedulerHook                 
(VERY_LOW    ) CheckpointHook                     
 -------------------- 
after_val:
(VERY_HIGH   ) RuntimeInfoHook                    
 -------------------- 
after_train:
(VERY_HIGH   ) RuntimeInfoHook                    
(VERY_LOW    ) CheckpointHook                     
 -------------------- 
before_test:
(VERY_HIGH   ) RuntimeInfoHook                    
 -------------------- 
before_test_epoch:
(NORMAL      ) IterTimerHook                      
 -------------------- 
before_test_iter:
(NORMAL      ) IterTimerHook                      
 -------------------- 
after_test_iter:
(NORMAL      ) IterTimerHook                      
(NORMAL      ) BadCaseAnalysisHook                
(NORMAL      ) PoseVisualizationHook              
(BELOW_NORMAL) LoggerHook                         
 -------------------- 
after_test_epoch:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
(NORMAL      ) BadCaseAnalysisHook                
(BELOW_NORMAL) LoggerHook                         
 -------------------- 
after_test:
(VERY_HIGH   ) RuntimeInfoHook                    
 -------------------- 
after_run:
(BELOW_NORMAL) LoggerHook                         
 -------------------- 
08/21 09:18:02 - mmengine - WARNING - The prefix is not set in metric class PCKAccuracy.
08/21 09:18:02 - mmengine - WARNING - The prefix is not set in metric class AUC.
08/21 09:18:03 - mmengine - INFO - Epoch(test) [ 50/755]    eta: 0:00:13  time: 0.0112  data_time: 0.0025  memory: 16  
08/21 09:18:03 - mmengine - INFO - Epoch(test) [100/755]    eta: 0:00:10  time: 0.0114  data_time: 0.0025  memory: 16  
08/21 09:18:04 - mmengine - INFO - Epoch(test) [150/755]    eta: 0:00:08  time: 0.0120  data_time: 0.0026  memory: 16  
08/21 09:18:05 - mmengine - INFO - Epoch(test) [200/755]    eta: 0:00:07  time: 0.0124  data_time: 0.0028  memory: 16  
08/21 09:18:05 - mmengine - INFO - Epoch(test) [250/755]    eta: 0:00:06  time: 0.0125  data_time: 0.0030  memory: 16  
08/21 09:18:06 - mmengine - INFO - Epoch(test) [300/755]    eta: 0:00:06  time: 0.0125  data_time: 0.0029  memory: 16  
08/21 09:18:06 - mmengine - INFO - Epoch(test) [350/755]    eta: 0:00:05  time: 0.0128  data_time: 0.0030  memory: 16  
08/21 09:18:07 - mmengine - INFO - Epoch(test) [400/755]    eta: 0:00:04  time: 0.0123  data_time: 0.0028  memory: 16  
08/21 09:18:08 - mmengine - INFO - Epoch(test) [450/755]    eta: 0:00:04  time: 0.0124  data_time: 0.0029  memory: 16  
08/21 09:18:08 - mmengine - INFO - Epoch(test) [500/755]    eta: 0:00:03  time: 0.0097  data_time: 0.0023  memory: 16  
08/21 09:18:09 - mmengine - INFO - Epoch(test) [550/755]    eta: 0:00:02  time: 0.0127  data_time: 0.0033  memory: 16  
08/21 09:18:10 - mmengine - INFO - Epoch(test) [600/755]    eta: 0:00:01  time: 0.0128  data_time: 0.0027  memory: 16  
08/21 09:18:10 - mmengine - INFO - Epoch(test) [650/755]    eta: 0:00:01  time: 0.0124  data_time: 0.0028  memory: 16  
08/21 09:18:11 - mmengine - INFO - Epoch(test) [700/755]    eta: 0:00:00  time: 0.0117  data_time: 0.0027  memory: 16  
08/21 09:18:11 - mmengine - INFO - Epoch(test) [750/755]    eta: 0:00:00  time: 0.0120  data_time: 0.0024  memory: 16  
08/21 09:18:11 - mmengine - INFO - Evaluating PCKAccuracy (normalized by ``"bbox_size"``)...
08/21 09:18:11 - mmengine - INFO - Evaluating AUC...
08/21 09:18:12 - mmengine - INFO - Epoch(test) [755/755]    PCK: 0.0061  AUC: 0.0052  data_time: 0.0031  time: 0.0127
root@f2b64220fba9:/RnD.Pose_CVAT/mmdeploy# python3 tools/test.py     configs/mmpose/pose-detection_onnxruntime_static.py     ../test_config_model.py    --m
odel mmdeploy_models/mmpose/trt/end2end.onnx
08/21 09:19:07 - mmengine - WARNING - Failed to search registry with scope "mmpose" in the "Codebases" registry tree. As a workaround, the current "Codebases" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmpose" is a correct scope, or whether the registry is initialized.
08/21 09:19:07 - mmengine - WARNING - Failed to search registry with scope "mmpose" in the "mmpose_tasks" registry tree. As a workaround, the current "mmpose_tasks" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmpose" is a correct scope, or whether the registry is initialized.
/usr/local/lib/python3.8/dist-packages/mmpose/datasets/datasets/utils.py:102: UserWarning: The metainfo config file "configs/_base_/datasets/coco.py" does not exist. A matched config file "/usr/local/lib/python3.8/dist-packages/mmpose/.mim/configs/_base_/datasets/coco.py" will be used instead.
  warnings.warn(
loading annotations into memory...
Done (t=0.04s)
creating index...
index created!
08/21 09:19:07 - mmengine - WARNING - Failed to search registry with scope "mmpose" in the "backend_segmentors" registry tree. As a workaround, the current "backend_segmentors" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmpose" is a correct scope, or whether the registry is initialized.
08/21 09:19:07 - mmengine - WARNING - The library of onnxruntime custom ops doesnot exist: 
08/21 09:19:09 - mmengine - INFO - 
------------------------------------------------------------
System environment:
    sys.platform: linux
    Python: 3.8.10 (default, May 26 2023, 14:05:08) [GCC 9.4.0]
    CUDA available: True
    MUSA available: False
    numpy_random_seed: 915285648
    GPU 0: NVIDIA GeForce RTX 4070 Laptop GPU
    CUDA_HOME: /usr/local/cuda
    NVCC: Cuda compilation tools, release 11.8, V11.8.89
    GCC: x86_64-linux-gnu-gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
    PyTorch: 2.0.0+cu118
    PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201703
  - Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.7.3 (Git Hash 6dbeffbae1f23cbbeae17adb7b5b13f1f37c080e)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.8
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90
  - CuDNN 8.7
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.8, CUDNN_VERSION=8.7.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.0.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 

    TorchVision: 0.15.0+cu118
    OpenCV: 4.5.4
    MMEngine: 0.10.3

Runtime environment:
    dist_cfg: {'backend': 'nccl'}
    seed: 915285648
    Distributed launcher: none
    Distributed training: False
    GPU number: 1
------------------------------------------------------------

08/21 09:19:09 - mmengine - INFO - Hooks will be executed in the following order:
before_run:
(VERY_HIGH   ) RuntimeInfoHook                    
(BELOW_NORMAL) LoggerHook                         
 -------------------- 
before_train:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
(VERY_LOW    ) CheckpointHook                     
 -------------------- 
before_train_epoch:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
(NORMAL      ) DistSamplerSeedHook                
 -------------------- 
before_train_iter:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
 -------------------- 
after_train_iter:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
(BELOW_NORMAL) LoggerHook                         
(LOW         ) ParamSchedulerHook                 
(VERY_LOW    ) CheckpointHook                     
 -------------------- 
after_train_epoch:
(NORMAL      ) IterTimerHook                      
(LOW         ) ParamSchedulerHook                 
(VERY_LOW    ) CheckpointHook                     
 -------------------- 
before_val:
(VERY_HIGH   ) RuntimeInfoHook                    
 -------------------- 
before_val_epoch:
(NORMAL      ) IterTimerHook                      
 -------------------- 
before_val_iter:
(NORMAL      ) IterTimerHook                      
 -------------------- 
after_val_iter:
(NORMAL      ) IterTimerHook                      
(NORMAL      ) PoseVisualizationHook              
(BELOW_NORMAL) LoggerHook                         
 -------------------- 
after_val_epoch:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
(BELOW_NORMAL) LoggerHook                         
(LOW         ) ParamSchedulerHook                 
(VERY_LOW    ) CheckpointHook                     
 -------------------- 
after_val:
(VERY_HIGH   ) RuntimeInfoHook                    
 -------------------- 
after_train:
(VERY_HIGH   ) RuntimeInfoHook                    
(VERY_LOW    ) CheckpointHook                     
 -------------------- 
before_test:
(VERY_HIGH   ) RuntimeInfoHook                    
 -------------------- 
before_test_epoch:
(NORMAL      ) IterTimerHook                      
 -------------------- 
before_test_iter:
(NORMAL      ) IterTimerHook                      
 -------------------- 
after_test_iter:
(NORMAL      ) IterTimerHook                      
(NORMAL      ) BadCaseAnalysisHook                
(NORMAL      ) PoseVisualizationHook              
(BELOW_NORMAL) LoggerHook                         
 -------------------- 
after_test_epoch:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
(NORMAL      ) BadCaseAnalysisHook                
(BELOW_NORMAL) LoggerHook                         
 -------------------- 
after_test:
(VERY_HIGH   ) RuntimeInfoHook                    
 -------------------- 
after_run:
(BELOW_NORMAL) LoggerHook                         
 -------------------- 
08/21 09:19:09 - mmengine - WARNING - The prefix is not set in metric class PCKAccuracy.
08/21 09:19:09 - mmengine - WARNING - The prefix is not set in metric class AUC.
08/21 09:19:25 - mmengine - INFO - Epoch(test) [ 50/755]    eta: 0:03:45  time: 0.3133  data_time: 0.0028  memory: 0  
08/21 09:19:41 - mmengine - INFO - Epoch(test) [100/755]    eta: 0:03:33  time: 0.3407  data_time: 0.0026  memory: 0  
08/21 09:19:57 - mmengine - INFO - Epoch(test) [150/755]    eta: 0:03:14  time: 0.3075  data_time: 0.0024  memory: 0  
08/21 09:20:14 - mmengine - INFO - Epoch(test) [200/755]    eta: 0:03:00  time: 0.3551  data_time: 0.0025  memory: 0  
08/21 09:20:29 - mmengine - INFO - Epoch(test) [250/755]    eta: 0:02:42  time: 0.3188  data_time: 0.0026  memory: 0  
08/21 09:20:47 - mmengine - INFO - Epoch(test) [300/755]    eta: 0:02:28  time: 0.4041  data_time: 0.0026  memory: 0  
08/21 09:21:03 - mmengine - INFO - Epoch(test) [350/755]    eta: 0:02:11  time: 0.3127  data_time: 0.0025  memory: 0  
08/21 09:21:18 - mmengine - INFO - Epoch(test) [400/755]    eta: 0:01:55  time: 0.3258  data_time: 0.0026  memory: 0  
08/21 09:21:35 - mmengine - INFO - Epoch(test) [450/755]    eta: 0:01:38  time: 0.3343  data_time: 0.0027  memory: 0  
08/21 09:21:50 - mmengine - INFO - Epoch(test) [500/755]    eta: 0:01:22  time: 0.3151  data_time: 0.0026  memory: 0  
08/21 09:22:07 - mmengine - INFO - Epoch(test) [550/755]    eta: 0:01:06  time: 0.3217  data_time: 0.0026  memory: 0  
08/21 09:22:22 - mmengine - INFO - Epoch(test) [600/755]    eta: 0:00:50  time: 0.3142  data_time: 0.0027  memory: 0  
08/21 09:22:38 - mmengine - INFO - Epoch(test) [650/755]    eta: 0:00:33  time: 0.3270  data_time: 0.0026  memory: 0  
08/21 09:22:54 - mmengine - INFO - Epoch(test) [700/755]    eta: 0:00:17  time: 0.3120  data_time: 0.0027  memory: 0  
08/21 09:23:10 - mmengine - INFO - Epoch(test) [750/755]    eta: 0:00:01  time: 0.3142  data_time: 0.0025  memory: 0  
08/21 09:23:11 - mmengine - INFO - Evaluating PCKAccuracy (normalized by ``"bbox_size"``)...
08/21 09:23:11 - mmengine - INFO - Evaluating AUC...
08/21 09:23:12 - mmengine - INFO - Epoch(test) [755/755]    PCK: 0.9733  AUC: 0.7570  data_time: 0.0031  time: 0.3214
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RTMPose TensorRT Performance Drop #2913

Checklist

Describe the bug

Reproduction

common setting

runtime

optimizer

learning rate

automatically scaling LR based on the actual training batch size

codec settings

model settings

base dataset settings

pipelines

data loaders

hooks

evaluators

Environment

Error traceback

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

RTMPose TensorRT Performance Drop #2913

Description

Checklist

Describe the bug

Reproduction

common setting

runtime

optimizer

learning rate

automatically scaling LR based on the actual training batch size

codec settings

model settings

base dataset settings

pipelines

data loaders

hooks

evaluators

Environment

Error traceback

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions