Open
Description
Checklist
- 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- 2. Please use English, otherwise it will be closed.
Motivation
In scenarios where Lora adapters are shared, since a single base model is used collectively and there is an upper limit on the number of Lora adapters that a single node can host, we may need to introduce an LRU strategy (similar to VLLM's approach) to evict some adapters. However, the reactivation and deactivation of adapters can affect service quality (SLOs), which may be unacceptable in certain production environments. Therefore, a feature is required to ensure that specific adapters are not automatically evicted by the LRU mechanism.