Proposal for LoRA autoscaler #313

kerthcet · 2025-03-13T11:19:48Z

What this PR does / why we need it

Support dense deployment for LoRA models

Which issue(s) this PR fixes

xref: #287

Special notes for your reviewer

Does this PR introduce a user-facing change?

None

Signed-off-by: kerthcet <[email protected]>

kerthcet · 2025-03-13T11:37:43Z

A draft, if you're interest, feel free to take a look. @googs1025 @nayihz

docs/proposals/lora-autoscaler/README.md

nayihz · 2025-03-13T14:39:52Z

docs/proposals/lora-autoscaler/README.md

+-->
+
+- e2e tests to make sure the lora service will run successfully
+- e2e tests to make sure the lora autoscaling works as expected, both scaling up and down


Most of e2e test cases may cost a lot of time due to download models/images. So we should design the test plan carefully.

Yes, but we need to make sure the function works as expected, so still needed.

kerthcet · 2025-03-14T03:17:09Z

/hold

googs1025 · 2025-03-14T06:13:00Z

docs/proposals/lora-autoscaler/README.md

+proposal will be implemented, this is the place to discuss them.
+-->
+
+### The LoRA Autoscaler


I don't know much about LoRA. Do we have a simple diagram to describe different components? This may help us understand it more quickly. 🤔

Make sense to me.

googs1025 · 2025-03-14T06:29:31Z

docs/proposals/lora-autoscaler/README.md

+    - Replica 7: lora-1
+
+  Make sure **at least one lora exists** in replicas, to avoid lora loading overhead in runtime.
+- Once the lora model loaded successfully, the gateway will update the route table for the lora requests


Does gateway refer to another component or something else?

Yes, we'll introduce the envoy gateway for smart routing. I may need to implement the gateway first.

kerthcet · 2025-05-13T07:39:47Z

Revisit this after #404, since it's a block feature.

Proposal for LoRA autoscaler

e710101

Signed-off-by: kerthcet <[email protected]>

Change the Proposal ID

99bded8

Signed-off-by: kerthcet <[email protected]>

nayihz reviewed Mar 13, 2025

View reviewed changes

docs/proposals/lora-autoscaler/README.md Show resolved Hide resolved

nayihz reviewed Mar 13, 2025

View reviewed changes

InftyAI-Agent added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 14, 2025

googs1025 reviewed Mar 14, 2025

View reviewed changes

kerthcet mentioned this pull request Mar 30, 2025

What is the difference between llmaz and lws? #333

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Proposal for LoRA autoscaler #313

Proposal for LoRA autoscaler #313

Uh oh!

kerthcet commented Mar 13, 2025

Uh oh!

kerthcet commented Mar 13, 2025

Uh oh!

Uh oh!

nayihz Mar 13, 2025

Uh oh!

kerthcet Mar 14, 2025

Uh oh!

kerthcet commented Mar 14, 2025

Uh oh!

googs1025 Mar 14, 2025

Uh oh!

kerthcet Mar 14, 2025

Uh oh!

googs1025 Mar 14, 2025

Uh oh!

kerthcet Mar 14, 2025

Uh oh!

kerthcet commented May 13, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Proposal for LoRA autoscaler #313

Are you sure you want to change the base?

Proposal for LoRA autoscaler #313

Uh oh!

Conversation

kerthcet commented Mar 13, 2025

What this PR does / why we need it

Which issue(s) this PR fixes

Special notes for your reviewer

Does this PR introduce a user-facing change?

Uh oh!

kerthcet commented Mar 13, 2025

Uh oh!

Uh oh!

nayihz Mar 13, 2025

Choose a reason for hiding this comment

Uh oh!

kerthcet Mar 14, 2025

Choose a reason for hiding this comment

Uh oh!

kerthcet commented Mar 14, 2025

Uh oh!

googs1025 Mar 14, 2025

Choose a reason for hiding this comment

Uh oh!

kerthcet Mar 14, 2025

Choose a reason for hiding this comment

Uh oh!

googs1025 Mar 14, 2025

Choose a reason for hiding this comment

Uh oh!

kerthcet Mar 14, 2025

Choose a reason for hiding this comment

Uh oh!

kerthcet commented May 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

kerthcet commented May 13, 2025 •

edited

Loading