-
-
Notifications
You must be signed in to change notification settings - Fork 33
Proposal for LoRA autoscaler #313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: kerthcet <[email protected]>
Signed-off-by: kerthcet <[email protected]>
A draft, if you're interest, feel free to take a look. @googs1025 @nayihz |
--> | ||
|
||
- e2e tests to make sure the lora service will run successfully | ||
- e2e tests to make sure the lora autoscaling works as expected, both scaling up and down |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of e2e test cases may cost a lot of time due to download models/images. So we should design the test plan carefully.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but we need to make sure the function works as expected, so still needed.
/hold |
proposal will be implemented, this is the place to discuss them. | ||
--> | ||
|
||
### The LoRA Autoscaler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know much about LoRA. Do we have a simple diagram to describe different components? This may help us understand it more quickly. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense to me.
- Replica 7: lora-1 | ||
|
||
Make sure **at least one lora exists** in replicas, to avoid lora loading overhead in runtime. | ||
- Once the lora model loaded successfully, the gateway will update the route table for the lora requests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does gateway refer to another component or something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we'll introduce the envoy gateway for smart routing. I may need to implement the gateway first.
Revisit this after #404, since it's a block feature. |
What this PR does / why we need it
Support dense deployment for LoRA models
Which issue(s) this PR fixes
xref: #287
Special notes for your reviewer
Does this PR introduce a user-facing change?