Skip to content

capacity managment change in starter.go not loading capacity correctly #3627

@ad-astra-video

Description

@ad-astra-video

Describe the bug
Change in model capacity handling does not work properly on all pipelines. I had FLUX.1-dev set to capacity: 2 and the error was insufficient capacity when updating to version that included this PR for batch pipelines.

The change in PR #3558 assumes a model can only serve 1 request at a time. For the most part this is likely correct and the Orchestrator needs to test to ensure the model can handle multiple requests at a time. In prior tests, image-to-text, segment-anything-2 and LLM runners can handle multiple requests per runner. LLM natively does this with vllm backend built to serve multiple requests simultaneously.

Should also consider updating the error to be more helpful to indicate the capacity set in the config is too large for the GPUs available.

If want to switch to this method, the docs should be updated to indicate must use external container if have capacity > 1 for a runner.
https://docs.livepeer.org/ai/orchestrators/models-config#param-capacity

cc @pschroedl @leszko

To Reproduce
Set capacity: 2 on any batch pipeline runner and try to start up an ai-worker

Expected behavior
Load the model config one time and allow capacity set to be sent to that runner.

Metadata

Metadata

Assignees

No one assigned

    Labels

    status: triagethis issue has not been evaluated yet

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions