Skip to content

[KEP] Introduce MultiKueue Dispatcher API #5410

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

mszadkow
Copy link
Contributor

@mszadkow mszadkow commented May 29, 2025

What type of PR is this?

/kind feature

What this PR does / why we need it:

The feature aims to improve performance and practicality by reducing the overhead of distributing workloads to all clusters simultaneously, minimizing the risk of duplicate admissions and unnecessary preemptions.
It should prevent triggering autoscaling across multiple worker clusters at the same time.

Which issue(s) this PR fixes:

Fixes #5141

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. labels May 29, 2025
Copy link

netlify bot commented May 29, 2025

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit bee1412
🔍 Latest deploy log https://app.netlify.com/projects/kubernetes-sigs-kueue/deploys/68714a515eecfa00084faf7b

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 29, 2025
@k8s-ci-robot k8s-ci-robot requested review from gabesaba and PBundyra May 29, 2025 07:37
@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label May 29, 2025
@mszadkow
Copy link
Contributor Author

/cc @mwielgus @mimowo @tenzen-y

@mszadkow
Copy link
Contributor Author

Ww need to discuss also the granularity of the timeout as mentioned by @mimowo

should the timeout be global, per manager, or per worker

@mszadkow
Copy link
Contributor Author

In my opinion this is not if, but how we deliver those levels for timeout, because we already see at leat 2 scenarios that require different levels.
One was mentioned in #5141 and the other in #3757.

This is one could be more general, timeout for the similar type of large amount of clusters.

Both performance (distributing and keeping 40 copies of workload in cluster informers can be expensive) and practical (trying all 40 clusters at the very same time can lead to lots of unnecessary preemptions).

This one should be more granular, probably on the worker level, different clusters but not many of them.

To prioritize the use of some clusters over others. For example a user may have one cluster with reservations, and one auto-scaled. The user prefers to first try the reservation cluster, and only as a fallback try autoscaling.

@mimowo
Copy link
Contributor

mimowo commented May 29, 2025

Let's start with KEP update for this.
/retitle MultiKueue KEP update to introduce MultiKueue Dispatcher API

/release-note-edit

NONE

@k8s-ci-robot k8s-ci-robot changed the title [Feature] Introduce MultiKueue Dispatcher API MultiKueue KEP update to introduce MultiKueue Dispatcher API May 29, 2025
@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed release-note Denotes a PR that will be considered when it comes time to generate release notes. labels May 29, 2025
@mimowo mimowo mentioned this pull request Jun 2, 2025
3 tasks
@mszadkow mszadkow force-pushed the feat/5141-mk-dsipatcher-api branch from 33b16c8 to a042b4f Compare June 10, 2025 10:55
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jun 10, 2025
@mimowo
Copy link
Contributor

mimowo commented Jun 27, 2025

LGTM. I 'm not tagging yet to give @tenzen-y a chance for more comments, and think more about spec vs status thread.

@mimowo
Copy link
Contributor

mimowo commented Jul 1, 2025

/lgtm
/assign @tenzen-y
for an extra pair of eyes

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 1, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 7144aa04b32bba71948a58634c6b96be2f39395e

@vladikkuzn
Copy link
Contributor

/assign

@mimowo
Copy link
Contributor

mimowo commented Jul 9, 2025

@vladikkuzn @mszadkow please address the remaining comment: #5410 (comment)

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 10, 2025
@mimowo
Copy link
Contributor

mimowo commented Jul 10, 2025

/lgtm
/approve
Leaving final approval to @tenzen-y
/hold

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Jul 10, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: ba769afe78d955aebd28fd202bdb634bc6d27a2f

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mimowo, mszadkow

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 10, 2025
Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for proceeding.

Comment on lines +349 to +350
and 3 additional clusters are nominated, until the workload is admitted or all eligible clusters have been considered.
This strategy allows for a controlled and gradual expansion of candidate clusters, rather than dispatching the workload to all clusters at once.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the .spec.nominatedClusterNames after all eligible clusters have been considered?
Is the field reset to empty?

If yes, it would be better to mention it here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would give ownership to the field to the dispatcher to decide when to reset etc.

Copy link
Member

@tenzen-y tenzen-y Jul 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, the dispatcher is implemented by upstream Kueue, right?
IIUC, AllAtOnce and Incremental are implemented by upstream Kueue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can have both external dispatchers, and built-in.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This description is for Incremental. So, the dispatcher is upstream one, IIUC.
My question is about the upstream Incremental dispatcher.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As i discussed with Marcin, in the future, hopefully 0.14 we will add parametrizing dispatcher. Then we could have a boolean flag indicating if the dispatcher should auto reject or not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC they do not get rejected, they stay nominated for admission from point of view of this feature.
Do you want to set the rejection state after another round has expired?

Oh, I see. Thank you for the good call. Could you mention in this proposal about what if the workload could not be scheduled to all clusters in the Incremental dispatcher? Is it just record cluster assignment error in the controller-manager logs?

Good question. I think this would be a sensible extension, but I wouldnt say it is necessary in first iteration. Note that we dont reject the workload until 0.12 with the built in dispatcher.

wdyt @tenzen-y?

I'm ok without Reject state for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to status, also I checked CEL validation was possible

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As i discussed with Marcin, in the future, hopefully 0.14 we will add parametrizing dispatcher. Then we could have a boolean flag indicating if the dispatcher should auto reject or not.

Could you describe what is "parametrizing dispatcher"? Here, what is parameter?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to status, also I checked CEL validation was possible

I still think that the nominatedClusterNames should be spec. Please follow #5410 (comment).

* .status.clusterName ; .spec.nominatedClusterNames
@vladikkuzn vladikkuzn force-pushed the feat/5141-mk-dsipatcher-api branch from f302ecf to d4b1a3f Compare July 10, 2025 19:16
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 10, 2025
@k8s-ci-robot k8s-ci-robot requested a review from tenzen-y July 10, 2025 19:16
@k8s-ci-robot
Copy link
Contributor

New changes are detected. LGTM label has been removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/feature Categorizes issue or PR as related to a new feature. release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MultiKueue dispatcher API
7 participants