-
Notifications
You must be signed in to change notification settings - Fork 372
[KEP] Introduce MultiKueue Dispatcher API #5410
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Skipping CI for Draft Pull Request. |
✅ Deploy Preview for kubernetes-sigs-kueue canceled.
|
Ww need to discuss also the granularity of the timeout as mentioned by @mimowo
|
In my opinion this is not if, but how we deliver those levels for timeout, because we already see at leat 2 scenarios that require different levels. This is one could be more general, timeout for the similar type of large amount of clusters.
This one should be more granular, probably on the worker level, different clusters but not many of them.
|
Let's start with KEP update for this. /release-note-edit
|
33b16c8
to
a042b4f
Compare
LGTM. I 'm not tagging yet to give @tenzen-y a chance for more comments, and think more about spec vs status thread. |
Co-authored-by: Michał Woźniak <[email protected]>
/lgtm |
LGTM label has been added. Git tree hash: 7144aa04b32bba71948a58634c6b96be2f39395e
|
/assign |
@vladikkuzn @mszadkow please address the remaining comment: #5410 (comment) |
/lgtm |
LGTM label has been added. Git tree hash: ba769afe78d955aebd28fd202bdb634bc6d27a2f
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mimowo, mszadkow The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for proceeding.
and 3 additional clusters are nominated, until the workload is admitted or all eligible clusters have been considered. | ||
This strategy allows for a controlled and gradual expansion of candidate clusters, rather than dispatching the workload to all clusters at once. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the .spec.nominatedClusterNames after all eligible clusters have been considered?
Is the field reset to empty?
If yes, it would be better to mention it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would give ownership to the field to the dispatcher to decide when to reset etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, the dispatcher is implemented by upstream Kueue, right?
IIUC, AllAtOnce and Incremental are implemented by upstream Kueue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can have both external dispatchers, and built-in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This description is for Incremental. So, the dispatcher is upstream one, IIUC.
My question is about the upstream Incremental dispatcher.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As i discussed with Marcin, in the future, hopefully 0.14 we will add parametrizing dispatcher. Then we could have a boolean flag indicating if the dispatcher should auto reject or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC they do not get rejected, they stay nominated for admission from point of view of this feature.
Do you want to set the rejection state after another round has expired?
Oh, I see. Thank you for the good call. Could you mention in this proposal about what if the workload could not be scheduled to all clusters in the Incremental dispatcher? Is it just record cluster assignment error in the controller-manager logs?
Good question. I think this would be a sensible extension, but I wouldnt say it is necessary in first iteration. Note that we dont reject the workload until 0.12 with the built in dispatcher.
wdyt @tenzen-y?
I'm ok without Reject state for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved to status, also I checked CEL validation was possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As i discussed with Marcin, in the future, hopefully 0.14 we will add parametrizing dispatcher. Then we could have a boolean flag indicating if the dispatcher should auto reject or not.
Could you describe what is "parametrizing dispatcher"? Here, what is parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved to status, also I checked CEL validation was possible
I still think that the nominatedClusterNames should be spec. Please follow #5410 (comment).
* .status.clusterName ; .spec.nominatedClusterNames
f302ecf
to
d4b1a3f
Compare
New changes are detected. LGTM label has been removed. |
What type of PR is this?
/kind feature
What this PR does / why we need it:
The feature aims to improve performance and practicality by reducing the overhead of distributing workloads to all clusters simultaneously, minimizing the risk of duplicate admissions and unnecessary preemptions.
It should prevent triggering autoscaling across multiple worker clusters at the same time.
Which issue(s) this PR fixes:
Fixes #5141
Special notes for your reviewer:
Does this PR introduce a user-facing change?