Skip to content

[Feature] LRU Eviction Strategy for Lora Adapters: Evicting Adapters with Priority #8053

Open
@whybeyoung

Description

@whybeyoung

Checklist

Motivation

In scenarios where Lora adapters are shared, since a single base model is used collectively and there is an upper limit on the number of Lora adapters that a single node can host, we may need to introduce an LRU strategy (similar to VLLM's approach) to evict some adapters. However, the reactivation and deactivation of adapters can affect service quality (SLOs), which may be unacceptable in certain production environments. Therefore, a feature is required to ensure that specific adapters are not automatically evicted by the LRU mechanism.

Related resources

CC @Fridge003 @lifuhuang @lw9527

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions