-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Misc][API] Cache and Router refactoring for concurrent performance, concurrent safety and stateful routing. #884
base: main
Are you sure you want to change the base?
[Misc][API] Cache and Router refactoring for concurrent performance, concurrent safety and stateful routing. #884
Conversation
Some code changes might be conflict with #878. that's not a problem now. We can review the core logic first and then rebase the changes later. |
I will have a check today |
…urrent safety and stateful routing Signed-off-by: Jingyuan Zhang <[email protected]>
Signed-off-by: Jingyuan Zhang <[email protected]>
dd7d52d
to
cea9419
Compare
return InitForGateway(config, stopCh, redisClient) | ||
} | ||
|
||
func InitForGateway(config *rest.Config, stopCh <-chan struct{}, redisClient *redis.Client) *Store { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should expose a featureFlag
list to such initialization logic. we can do this later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, since the cache has limited use cases, I only expose initialization interfaces with a fixed feature set to avoid misconfiguration. More flexible initialization function can be added later.
// Parameters: | ||
// modelName: Name of the model | ||
// Returns: | ||
// bool: True if model exists, false otherwise | ||
GetModel(modelName string) bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you delete ListPods
and GetModel
in the interface?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no use case for ListPods except in unit tests. For this reason, I only provide a non-efficient implementation to return all pods in the cache and delete the function from the API.
The name of GetModel implies it will return some model object. However, the original GetModel API only returns true or false. HasModel would be a more proper name for this API.
// ListPodsByModel gets pods associated with a model | ||
// Parameters: | ||
// modelName: Name of the model | ||
// Returns: | ||
// map[string]*v1.Pod: Pod objects matching the criteria | ||
// error: Error information if operation fails | ||
ListPodsByModel(modelName string) (map[string]*v1.Pod, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please help me recap the motivation here. the core part is to refactor the cache store registry with better and efficiency data structure? could you remind me the painpoint here? is it just for code refactor or there's some challenges extend it or performance issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original API exposes a thread-unsafe map object while the implementation returns what is stored in the cache without making a copy. This will cause panic on map read accessing while the map is written in the Lock() context.
Because there is no need for map keys, the new implementation returns a slice with read-through-cache acceleration.
@@ -73,7 +70,7 @@ type ModelCache interface { | |||
// Returns: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will we have other informations in future? in that case, we may need complex structure as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is possible. As *v1.Pod suggests, if our model(CRD) definition contains more options, such as labels, we may need to update returns here with complex structure. In this case, I would suggest adding the model definition to github.com/vllm-project/aibrix/api/model/v1alpha1
// requestID: Unique request identifier | ||
// modelName: Name of the model | ||
// Returns: | ||
// int64: Trace term identifier | ||
AddRequestCount(requestID string, modelName string) (traceTerm int64) | ||
AddRequestCount(ctx *types.RoutingContext, requestID string, modelName string) (traceTerm int64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here's an interface change . AddRequestCount
and DoneRequestCount
is moved under TraceCache
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, AddRequestCount is paired with DoneRequestCount or DoneRequestTrace. For example
AddRequestCount + DoneRequestCount or AddRequestCount + DoneRequestTrace. While these functions were designed for trace only, they are also used as an entry point for real-time statistics. In contrast, the metric cache offers an interface for accessing cached metrics/real-time statistics.
In summary:
- MetricCache is used for reading metrics
- TraceCache is used to generate real-time statistics besides trace. I think MetricCollector may be a better name for this interface. I welcome other options.
Related changes:
enableGPUOptimizerTracing is moved from gateway to cache to offer: - While enableGPUOptimizerTracing = false, the trace serving routine (for writing to redis) will not start altogether.
- The trace collection operations in AddRequestCount, DoneRequestCount, and DoneRequestTrace can be disabled individually.
defer c.mu.RUnlock() | ||
|
||
pod, ok := c.Pods[podName] | ||
metaPod, ok := c.metaPods.Load(podName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to get more context here. (Sorry I didn't clearly review the issue yet). original way is very common in golang, what's the problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
R lock is ok. Since we use sync.map to replace thread-unsafe maps, Rlock is not needed anymore.
Using sync.map lower the lock granularity from cache-wide to map-wide.
) | ||
|
||
const ( | ||
DeploymentIdentifier string = "app.kubernetes.io/name" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some scenarios like distributed inference use other orchestrators and we may not have this label.
better to have a group identifier could be configured
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am thinking maybe client-go indexer is another option if we think map[]xxx is not efficient
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me define a new interface to replace PodArray. We can replace the PodArray with Indexer if that's more efficient later.
For DeploymentIdentifier, I can also add something like PodGrouper for alternative implementations.
@@ -140,33 +140,59 @@ func CountReadyPods(podList *v1.PodList) (int64, error) { | |||
return int64(readyPodCount), nil | |||
} | |||
|
|||
// FilterReadyPods filters and returns a list of pods that have a valid PodIP. | |||
func FilterReadyPods(pods map[string]*v1.Pod) []*v1.Pod { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems FilterReadyPods
has been deleted? do we have any other references?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has been renamed to FilterRoutablePods to avoid confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. If we do not have other caller, that's ok. If we do, we may need to keep a copy. but seems we do not have it
/cc @Xunzhuo @varungup90 Please help review it. @zhangjyr this is still a huge PR, if we have some agreement on the data structure and helper utilities. I suggest to add those |
I think we need to let e2e test passed before reviews |
Pull Request Description
Refactoring for cache:
Refactoring for router
Related Issues
Resolves: #868
Important: Before submitting, please complete the description above and review the checklist below.
Contribution Guidelines (Expand for Details)
We appreciate your contribution to aibrix! To ensure a smooth review process and maintain high code quality, please adhere to the following guidelines:
Pull Request Title Format
Your PR title should start with one of these prefixes to indicate the nature of the change:
[Bug]
: Corrections to existing functionality[CI]
: Changes to build process or CI pipeline[Docs]
: Updates or additions to documentation[API]
: Modifications to aibrix's API or interface[CLI]
: Changes or additions to the Command Line Interface[Misc]
: For changes not covered above (use sparingly)Note: For changes spanning multiple categories, use multiple prefixes in order of importance.
Submission Checklist
By submitting this PR, you confirm that you've read these guidelines and your changes align with the project's contribution standards.