Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TorchServe with Kserve_wrapper v2 throws 'message': 'number of batch response mismatched' #2158

Open
gavrissh opened this issue Feb 24, 2023 · 10 comments · May be fixed by #3341
Open

TorchServe with Kserve_wrapper v2 throws 'message': 'number of batch response mismatched' #2158

gavrissh opened this issue Feb 24, 2023 · 10 comments · May be fixed by #3341

Comments

@gavrissh
Copy link

🐛 Describe the bug

Torchserve supports batching of multiple requests and batch_size value is provided while registering the model.

Request Envelope receives the input as list of multiple request body but Kserve V2 request envelope picks only the first item in the list of inputs
https://github.com/pytorch/serve/blob/master/ts/torch_handler/request_envelope/kservev2.py#L104

The result being a single output sent back as response causing the mismatch

Error logs

TorchServe Error
stdout MODEL_LOG - model: resnet50-3, number of batch response mismatched, expect: 5, got: 1.

Installation instructions

Followed instructions provided here - https://github.com/pytorch/serve/blob/master/kubernetes/kserve/kserve_wrapper/README.md

Model Packaing

Created a resnet50.mar using default parameters and handler

config.properties

inference_address=http://0.0.0.0:8085/
management_address=http://0.0.0.0:8085/
metrics_address=http://0.0.0.0:8082/
grpc_inference_port=7075
grpc_management_port=7076
enable_envvars_config=true
install_py_dep_per_model=true
enable_metrics_api=true
metrics_format=prometheus
NUM_WORKERS=1
number_of_netty_threads=4
job_queue_size=10
model_store=/mnt/models/model_store
model_snapshot={"name":"startup.cfg","modelCount":1,"models":{"resnet50": {"1.0": {"defaultVersion": true,"marName": "resnet50.mar","minWorkers": 6,"maxWorkers": 6,"batchSize": 16,"maxBatchDelay": 200,"responseTimeout": 2000}}}}

Versions

Name: kserve
Version: 0.10.0

Name: torch
Version: 1.13.1+cu117

Name: torchserve
Version: 0.7.1

Repro instructions

Followed instructions provided here - https://github.com/pytorch/serve/blob/master/kubernetes/kserve/kserve_wrapper/README.md

run the kserve_wrapper main.py and hit multiple curl infer request for v2 protocol

Command used -
seq 1 10 | xargs -n1 -P 5 curl -H "Content-Type: application/json" --data @input_bytes.json http://0.0.0.0:8080/v2/models/resnet50/infer

Possible Solution

Changes required to handle Torchserve batched inputs and generate output for all the requests initiated by TorchServe

Changes are need in parse_input() and format_output() methods in kservev2.py

@jagadeeshi2i
Copy link
Collaborator

@gavrishp Both envelopes needs fix for KServe 0.10 and v2 protocol has 2 examples one with bytes input and another with tensor input. Let me know what are you working on. I can take up the remaining.

@gavrissh
Copy link
Author

@jagadeeshi2i I can take the v2 protocol changes.

There's one use case, I need inputs for.
As Kserve supports batching within a single request.
Request Example -

{
  "inputs": [
    {
      "name": "input-0",
      "shape": [37],
      "datatype": "INT64",
      "data": [66, 108, 111, 111, 109]
    },
    {
      "name": "input-0",
      "shape": [37],
      "datatype": "INT64",
      "data": [66, 108, 111, 111, 109]
    }
  ]
}

Response for the above example is

{
  "model_name":"resnet50",
  "model_version":"3.0",
  "id":"c0229ab0-f157-4917-974a-93646a51a57d",
  "parameters":null,
  "outputs":[
    {
      "name":"predict",
      "shape":[],
      "datatype":"BYTES",
      "parameters":null,
      "data":[2]
    },
    {
      "name":"predict",
      "shape":[],
      "datatype":"BYTES",
      "parameters":null,
      "data":[2]
    }
  ]
}

But with torchserve batching of multiple requests, as the handler postprocess output would return list of outputs. Might also need to hold some additional state to keep track of which input came from which request_id right?

@jagadeeshi2i
Copy link
Collaborator

In the above example a single http request has multiple inputs in it. So the response will have outputs with same order with request id. You are referring to Torchserve dynamic batching, which is not supported in KServe integration.

@gavrissh
Copy link
Author

This issue is concerning the Torchserve dynamic batching with Kserve integration. Is there any particular reason for it not being supported? Is it planned to be supported in the future?

If that is the case, TS model config batch_size should not be allowed to be set more than 1 right now. I suppose it is causing this particular issue.

My understanding is that this batching will help with better GPU Utilisation and higher throughput values. My testing results supports this.

@jagadeeshi2i
Copy link
Collaborator

jagadeeshi2i commented Feb 27, 2023

Torchserve with KServe has batching support. The inputs are statically batched. Torchserve on it own dynamic batching where it waits for batch_delay time for batch_size to be filled.

KServe v2 requires sending all inputs in a single request. Setting batch_size more than 1 here will make Torchserve wait for the batch_dealy.

Regarding GPU utilization both static and dynamic batching starts processing after all the input in received so this will not affect the GPU uitilization.

@gavrissh
Copy link
Author

Thanks for clarifying!

What would you suggest is the correct fix here for this issue?

  • when the user sets batch_size > 1 and TS service throws this error 'message': 'number of batch response mismatched' as it did dynamic batching of multiple inputs.

@jagadeeshi2i
Copy link
Collaborator

set batch_size to 1

@jagadeeshi2i
Copy link
Collaborator

@gavrishp is the issue resolved now ?

@gavrissh
Copy link
Author

@jagadeeshi2i Had a query, is it by design we are selecting only the first element in the batch in the kserve envelopes?

https://github.com/pytorch/serve/blob/master/ts/torch_handler/request_envelope/kserve.py#L27
https://github.com/pytorch/serve/blob/master/ts/torch_handler/request_envelope/kservev2.py#L102,L111

This is still voiding any use-case with batch_size > 1.

@matej14086
Copy link

The main feature of torchserve is dynamic batching, especially if you have requests from multiple sources.
It's a bummer that Kserve doesn't support that

@pkluska pkluska linked a pull request Oct 4, 2024 that will close this issue
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants