Priority set by localqueue and immutable #7129
Replies: 3 comments 5 replies
-
I think what you are asking is a common ask which may be modeled in a couple ways depending on details. As the starting point for the discussion I would suggest the baseline solution is the "classical preemption" in a cohort. Here, you have LQ "BestEffort" pointing to CQ "BestEffort", and LQ "Guaraneed" pointing to CQ "Guaranteed". Then, the preemption rules could be:
For more complex setups you may also consider FairSharing where the "weight" allows you to control the relative priority between ClusterQueues. However, before exploring FairSharing it would be good to know if this is the direction you would like to go. On that occassion I would love to have more examples of such "go to setups" in the docs https://kueue.sigs.k8s.io/docs/tasks/manage/administer_cluster_quotas/, so hopefully we could use your setup to enrich the documentation which currently is focusing more on semantic on individual building blocks, rather than entire system modelling. |
Beta Was this translation helpful? Give feedback.
-
At a higher level, you're describing having effectively global priority within a global queue for ordering. This causes a lot of issues for larger organizations with complicated team structures sharing the same cluster and there's resource contention. I'd be curious if for your example, there's a large organization backing it? Or if its literally 2 teams / 2 categories of workloads sharing 1 HPC environment. |
Beta Was this translation helpful? Give feedback.
-
Thanks for the prompt response. For some more background: our users often their understanding of queues from their experience of them in academia. What I have observed is that the academic queuing systems are described in terms of multiple queues that provide different TTLs, job scales etc. Queues may also be used to segment jobs by cost, urgency. Examples include:
What I have observed with our users where we have provided them with freedom to set priority is:
After experiencing large job queuing systems that prohibit users setting priority, and observing these issues, I have become biased against users setting priority. Hence my discussion point: can I take away user control priority and make it a function of localqueue? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I am interested in using queues designed to share surplus resource. For example, I may bursty jobs from one namespace. A second namespace has less important work that is ok to be interrupted. Owners of the first namespace are happy to share their surplus resource when they not using it. A queue system can achieve this with two localqueues both directing at the same resource pool. Jobs entering localqueue-A have priority 1000, jobs entering localqueue-B have priority 500. If priority can be set by the localqueue (and immutable) I can build this simple and easy-to-reason-about system.
From my experience, this queue design pattern is well proven and fairly common in existing HPC environments. I am interested in people's thoughts on achieving this with Kueue today - or changes that might make it simple in future.
Beta Was this translation helpful? Give feedback.
All reactions