-
Notifications
You must be signed in to change notification settings - Fork 867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TorchServe with Kserve_wrapper v2 throws 'message': 'number of batch response mismatched' #2158
Comments
@gavrishp Both envelopes needs fix for KServe 0.10 and v2 protocol has 2 examples one with bytes input and another with tensor input. Let me know what are you working on. I can take up the remaining. |
@jagadeeshi2i I can take the v2 protocol changes. There's one use case, I need inputs for.
Response for the above example is
But with torchserve batching of multiple requests, as the handler postprocess output would return list of outputs. Might also need to hold some additional state to keep track of which input came from which request_id right? |
In the above example a single http request has multiple inputs in it. So the response will have outputs with same order with request id. You are referring to Torchserve dynamic batching, which is not supported in KServe integration. |
This issue is concerning the Torchserve dynamic batching with Kserve integration. Is there any particular reason for it not being supported? Is it planned to be supported in the future? If that is the case, TS model config batch_size should not be allowed to be set more than 1 right now. I suppose it is causing this particular issue. My understanding is that this batching will help with better GPU Utilisation and higher throughput values. My testing results supports this. |
Torchserve with KServe has batching support. The inputs are statically batched. Torchserve on it own dynamic batching where it waits for KServe v2 requires sending all inputs in a single request. Setting batch_size more than 1 here will make Torchserve wait for the Regarding GPU utilization both static and dynamic batching starts processing after all the input in received so this will not affect the GPU uitilization. |
Thanks for clarifying! What would you suggest is the correct fix here for this issue?
|
set |
@gavrishp is the issue resolved now ? |
@jagadeeshi2i Had a query, is it by design we are selecting only the first element in the batch in the kserve envelopes? https://github.com/pytorch/serve/blob/master/ts/torch_handler/request_envelope/kserve.py#L27 This is still voiding any use-case with batch_size > 1. |
The main feature of torchserve is dynamic batching, especially if you have requests from multiple sources. |
🐛 Describe the bug
Torchserve supports batching of multiple requests and batch_size value is provided while registering the model.
Request Envelope receives the input as list of multiple request body but Kserve V2 request envelope picks only the first item in the list of inputs
https://github.com/pytorch/serve/blob/master/ts/torch_handler/request_envelope/kservev2.py#L104
The result being a single output sent back as response causing the mismatch
Error logs
TorchServe Error
stdout MODEL_LOG - model: resnet50-3, number of batch response mismatched, expect: 5, got: 1.
Installation instructions
Followed instructions provided here - https://github.com/pytorch/serve/blob/master/kubernetes/kserve/kserve_wrapper/README.md
Model Packaing
Created a resnet50.mar using default parameters and handler
config.properties
inference_address=http://0.0.0.0:8085/
management_address=http://0.0.0.0:8085/
metrics_address=http://0.0.0.0:8082/
grpc_inference_port=7075
grpc_management_port=7076
enable_envvars_config=true
install_py_dep_per_model=true
enable_metrics_api=true
metrics_format=prometheus
NUM_WORKERS=1
number_of_netty_threads=4
job_queue_size=10
model_store=/mnt/models/model_store
model_snapshot={"name":"startup.cfg","modelCount":1,"models":{"resnet50": {"1.0": {"defaultVersion": true,"marName": "resnet50.mar","minWorkers": 6,"maxWorkers": 6,"batchSize": 16,"maxBatchDelay": 200,"responseTimeout": 2000}}}}
Versions
Name: kserve
Version: 0.10.0
Name: torch
Version: 1.13.1+cu117
Name: torchserve
Version: 0.7.1
Repro instructions
Followed instructions provided here - https://github.com/pytorch/serve/blob/master/kubernetes/kserve/kserve_wrapper/README.md
run the kserve_wrapper main.py and hit multiple curl infer request for v2 protocol
Command used -
seq 1 10 | xargs -n1 -P 5 curl -H "Content-Type: application/json" --data @input_bytes.json http://0.0.0.0:8080/v2/models/resnet50/infer
Possible Solution
Changes required to handle Torchserve batched inputs and generate output for all the requests initiated by TorchServe
Changes are need in parse_input() and format_output() methods in kservev2.py
The text was updated successfully, but these errors were encountered: