-
Notifications
You must be signed in to change notification settings - Fork 204
Description
Describe the bug
Change in model capacity handling does not work properly on all pipelines. I had FLUX.1-dev set to capacity: 2 and the error was insufficient capacity when updating to version that included this PR for batch pipelines.
The change in PR #3558 assumes a model can only serve 1 request at a time. For the most part this is likely correct and the Orchestrator needs to test to ensure the model can handle multiple requests at a time. In prior tests, image-to-text, segment-anything-2 and LLM runners can handle multiple requests per runner. LLM natively does this with vllm backend built to serve multiple requests simultaneously.
Should also consider updating the error to be more helpful to indicate the capacity set in the config is too large for the GPUs available.
If want to switch to this method, the docs should be updated to indicate must use external container if have capacity > 1 for a runner.
https://docs.livepeer.org/ai/orchestrators/models-config#param-capacity
To Reproduce
Set capacity: 2 on any batch pipeline runner and try to start up an ai-worker
Expected behavior
Load the model config one time and allow capacity set to be sent to that runner.