Skip to content

Proposal for LoRA autoscaler #313

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

kerthcet
Copy link
Member

What this PR does / why we need it

Support dense deployment for LoRA models

Which issue(s) this PR fixes

xref: #287

Special notes for your reviewer

Does this PR introduce a user-facing change?

None

@InftyAI-Agent InftyAI-Agent added needs-triage Indicates an issue or PR lacks a label and requires one. needs-priority Indicates a PR lacks a label and requires one. do-not-merge/needs-kind Indicates a PR lacks a label and requires one. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Mar 13, 2025
Signed-off-by: kerthcet <[email protected]>
@kerthcet
Copy link
Member Author

A draft, if you're interest, feel free to take a look. @googs1025 @nayihz

-->

- e2e tests to make sure the lora service will run successfully
- e2e tests to make sure the lora autoscaling works as expected, both scaling up and down
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of e2e test cases may cost a lot of time due to download models/images. So we should design the test plan carefully.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but we need to make sure the function works as expected, so still needed.

@kerthcet
Copy link
Member Author

/hold

@InftyAI-Agent InftyAI-Agent added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 14, 2025
proposal will be implemented, this is the place to discuss them.
-->

### The LoRA Autoscaler
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know much about LoRA. Do we have a simple diagram to describe different components? This may help us understand it more quickly. 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense to me.

- Replica 7: lora-1

Make sure **at least one lora exists** in replicas, to avoid lora loading overhead in runtime.
- Once the lora model loaded successfully, the gateway will update the route table for the lora requests
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does gateway refer to another component or something else?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we'll introduce the envoy gateway for smart routing. I may need to implement the gateway first.

@kerthcet
Copy link
Member Author

kerthcet commented May 13, 2025

Revisit this after #404, since it's a block feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. do-not-merge/needs-kind Indicates a PR lacks a label and requires one. needs-priority Indicates a PR lacks a label and requires one. needs-triage Indicates an issue or PR lacks a label and requires one.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants