[Feature] Triton server #2088

irexyc · 2023-05-18T06:14:09Z

Motivation

Support model serving

Modification

Add triton custom backend
Add demo

codecov · 2023-05-18T06:33:15Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 49.67%. Comparing base (8e658cd) to head (fcdf52f).
Report is 92 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #2088   +/-   ##
=======================================
  Coverage   49.67%   49.67%           
=======================================
  Files         339      339           
  Lines       12998    12998           
  Branches     1906     1906           
=======================================
  Hits         6457     6457           
  Misses       6090     6090           
  Partials      451      451

Flag	Coverage Δ
unittests	`49.67% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

irexyc · 2023-05-23T06:38:02Z

can temporarily use this docker image for testing

docker pull irexyc/mmmdeploy:triton-22.12

Y-T-G · 2023-10-11T07:28:15Z

Hey, thanks for this. I wanted to know how do I correctly send multiple bboxes for keypoint-detection inference.

I created a dict for each bbox here and added to the value list, and used that, but the results are not accurate, although the number of keypoints returned matches the number of bboxes.

bbox_list = [{'bbox':bbox} for bbox in bboxes.tolist()]
bbox = {
    'type': 'PoseBbox',
    'value': bbox_list
}

Y-T-G · 2023-10-14T15:13:08Z

Also, what does this mean only support batch dim 1 for single request? Does this mean the Triton version does not support batch inference?

irexyc · 2023-10-16T12:46:27Z

@Y-T-G

I created a dict for each bbox here and added to the value list, and used that, but the results are not accurate, although the number of keypoints returned matches the number of bboxes.

Cou you show the visualize result with bboxes? Are the inference result with single bbox looks right?

Also, what does this mean only support batch dim 1 for single request? Does this mean the Triton version does not support batch inference?

For batch inference of mmdeploy, you can refer to this #839 (comment)

Triton server support dynamic batcher and sequence batcher. But mmdeploy backend only support dynamic batcher. You can add these lines to config.pbtxt.

dynamic_batching {
  max_queue_delay_microseconds: 100
}

With allow_ragged_batch and dynamic_batching, mmdeploy backend can receive a batch of requests for each inference step (therefore, you don't have to construct normal batch input like b x c x h x w. you only need send c x h x w to triton server and let it to collect batch requests.)

In summary, to use mmdeploy triton backend with batch inference, you have to:

convert the model with batch inference support and edit the pipeline.json
add dynamic_batching to config.pbtxt

Y-T-G · 2023-10-16T13:18:56Z

@irexyc

Cou you show the visualize result with bboxes? Are the inference result with single bbox looks right?

Yes with single bbox the inference is correct. But if I add more than one bbox, the outputs don't make any sense.

I only visualize the nose, left wrist and right wrist keypoints. This is from RTMPose.

This is how it looks when I add more than 1:

This is how it looks like when I do individually, cropping each bbox and sending each for inference separately:

The input for multiple bbox looks like this:

{
   "type":"PoseBbox",
   "value":[
      {
         "bbox":[
            866,
            47,
            896,
            101
         ]
      },
      {
         "bbox":[
            48,
            65,
            73,
            125
         ]
      },
      {
         "bbox":[
            425,
            32,
            447,
            97
         ]
      },
      ....
      ....
      ....
      ....
   ]
}

Y-T-G · 2023-10-16T13:46:09Z

For batch inference of mmdeploy, you can refer to this #839 (comment)

Triton server support dynamic batcher and sequence batcher. But mmdeploy backend only support dynamic batcher. You can add these lines to config.pbtxt.
dynamic_batching {
  max_queue_delay_microseconds: 100
}
With allow_ragged_batch and dynamic_batching, mmdeploy backend can receive a batch of requests for each inference step (therefore, you don't have to construct normal batch input like b x c x h x w. you only need send c x h x w to triton server and let it to collect batch requests.)

In summary, to use mmdeploy triton backend with batch inference, you have to:

convert the model with batch inference support and edit the pipeline.json

add dynamic_batching to config.pbtxt

I am not sure if this works. I don't see any improvements when I do this after checking with perf_analyzer for ResNet18:

Inferences/Second vs. Client Average Batch Latency
Concurrency: 2, throughput: 41.6086 infer/sec, latency 47983 usec
Concurrency: 3, throughput: 41.8305 infer/sec, latency 71626 usec
Concurrency: 4, throughput: 41.775 infer/sec, latency 95672 usec
Concurrency: 5, throughput: 41.2752 infer/sec, latency 120931 usec
Concurrency: 6, throughput: 41.7747 infer/sec, latency 143440 usec
Concurrency: 7, throughput: 41.7748 infer/sec, latency 167467 usec
Concurrency: 8, throughput: 41.6641 infer/sec, latency 191807 usec

It supports batching in the json.

I can see better improvements by launching multiple model instances using:

instance_group [ 
  { 
    count: 4
    kind: KIND_GPU 
  }
]

Inferences/Second vs. Client Average Batch Latency
Concurrency: 2, throughput: 62.7163 infer/sec, latency 31838 usec
Concurrency: 3, throughput: 65.1612 infer/sec, latency 46015 usec
Concurrency: 4, throughput: 79.328 infer/sec, latency 50415 usec
Concurrency: 5, throughput: 84.3826 infer/sec, latency 59160 usec
Concurrency: 6, throughput: 90.2152 infer/sec, latency 66516 usec
Concurrency: 7, throughput: 89.4926 infer/sec, latency 78322 usec
Concurrency: 8, throughput: 88.104 infer/sec, latency 90731 usec

I think dynamic_batcher depends on sequence_batching. But since each request is handled separately in instance_state.cpp, dynamic_batching will not have any effect. To have an effect, the requests have to batched and then inferred all at once.

lzhangzz and others added 16 commits May 18, 2023 15:39

wip

8b309c7

wip

7f85148

wip

c89eb58

update

1183a98

remove no used backend state

d644d27

fix build

a8b78de

backwards compatibility

645dee1

pipeline sync

4cd226f

add demo

0208fba

update triton cmakelist

bfa3c64

update dockerfile

b901112

add triton demo readme

bc9db63

update docs

121d9ba

fix lint

9aed1bf

update docs

6fe281c

fix lint

fcdf52f

irexyc force-pushed the triton-server branch from 8d45c25 to fcdf52f Compare May 18, 2023 07:40

irexyc changed the title ~~Triton server~~ [Feature] Triton server May 18, 2023

RunningLeon requested review from grimoire, lzhangzz and lvhan028 May 22, 2023 06:58

RunningLeon added the feature label May 22, 2023

RunningLeon requested a review from AllentDan May 23, 2023 02:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] Triton server #2088

[Feature] Triton server #2088

Uh oh!

irexyc commented May 18, 2023

Uh oh!

codecov bot commented May 18, 2023 •

edited

Loading

Uh oh!

irexyc commented May 23, 2023

Uh oh!

Y-T-G commented Oct 11, 2023 •

edited

Loading

Uh oh!

Y-T-G commented Oct 14, 2023

Uh oh!

irexyc commented Oct 16, 2023

Uh oh!

Y-T-G commented Oct 16, 2023 •

edited

Loading

Uh oh!

Y-T-G commented Oct 16, 2023 •

edited

Loading

Uh oh!

Uh oh!

[Feature] Triton server #2088

Are you sure you want to change the base?

[Feature] Triton server #2088

Uh oh!

Conversation

irexyc commented May 18, 2023

Motivation

Modification

Uh oh!

codecov bot commented May 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

irexyc commented May 23, 2023

Uh oh!

Y-T-G commented Oct 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Y-T-G commented Oct 14, 2023

Uh oh!

irexyc commented Oct 16, 2023

Uh oh!

Y-T-G commented Oct 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Y-T-G commented Oct 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

codecov bot commented May 18, 2023 •

edited

Loading

Y-T-G commented Oct 11, 2023 •

edited

Loading

Y-T-G commented Oct 16, 2023 •

edited

Loading

Y-T-G commented Oct 16, 2023 •

edited

Loading