-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Set root capability only when user not set it #4354
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
@@ -598,7 +598,9 @@ func (cp *capacityPlugin) buildHierarchicalQueueAttrs(ssn *framework.Session) bo | |||
|
|||
// init root queue realCapability/capability/deserved as cp.totalResource | |||
rootQueueAttr := cp.queueOpts[api.QueueID(cp.rootQueue)] | |||
rootQueueAttr.capability = cp.totalResource | |||
if rootQueueAttr.capability == nil || rootQueueAttr.capability.IsEmpty() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think rootQueueAttr.capability.IsEmpty()
is enough because the capability of root queue will be initialized in newQueueAttr
, and it won't be nil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your suggestion, done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please sqush to one commit, thanks :)
|
/ok-to-test |
/lgtm |
rootQueueAttr.realCapability = cp.totalResource | ||
rootQueueAttr.deserved = cp.totalResource |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we handle realCapability
and deseverd
too? if realCapability
and deseverd
not empty and less than cp.totalResource
may occurs error ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think realCapability and deseverd == cp.totalResource
is suitable,because we need at least a field that identifies how many resources are actually available in the cluster and I haven't thought of any scenarios where these two fields need to be customized. Or do you have some ideas? We can discuss it together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think
realCapability and deseverd == cp.totalResource
is suitable,because we need at least a field that identifies how many resources are actually available in the cluster and I haven't thought of any scenarios where these two fields need to be customized. Or do you have some ideas? We can discuss it together.
yes, i think u r right, the realcapacity
and deseverd
will be caculated in checkHierarchicalQueue
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems that if users previously set deserved fields of each sub queue, and then deserved field check may also fail when cluster resources reduced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right, done, please review again, thanks.
That's a good catch, and @Xu-Wentao also has the same requirement, and I think that we can add root queue at helm chart, then user can set the spec of root queue when install volcano, and we can remove root queue creation operation in volcano scheduler. |
@Monokaix Good idea and we can open another feature issue and track it. If you have time,please give me approve, thanks. |
OK @houyuting Would you like to raise this issue? Otherwise we may miss it :) |
Please sign off your commit with |
The root queue will also be updated when close session, so it's not enough to just modify plugin. |
got it |
93f4a7b
to
e9e096a
Compare
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly implements the intended change to only set the root queue's capability and deserved resources if they are not already defined by the user. The changes are logical and the new test cases are a good addition.
I've pointed out a couple of areas for improvement. One is a minor comment update for maintainability. The other is a more significant concern regarding the removal of logic that caps a queue's deserved
resources by its realCapability
in a hierarchical setup. This could lead to inconsistencies and scheduling issues, so it's worth a second look. Overall, great work on addressing the issue.
@@ -598,9 +599,13 @@ func (cp *capacityPlugin) buildHierarchicalQueueAttrs(ssn *framework.Session) bo | |||
|
|||
// init root queue realCapability/capability/deserved as cp.totalResource |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment on this line is now outdated due to the changes below. It states that capability
and deserved
are initialized as cp.totalResource
, but this is now done conditionally. Please update the comment to reflect the new logic for better code maintainability.
// init root queue realCapability/capability/deserved as cp.totalResource | |
// init root queue: realCapability is set to total resource, and capability/deserved are also set if empty. |
@@ -370,6 +370,7 @@ func Test_capacityPlugin_OnSessionOpenWithHierarchy(t *testing.T) { | |||
pg1 := util.BuildPodGroup("pg1", "ns1", "q11", 1, nil, schedulingv1beta1.PodGroupInqueue) | |||
// queue | |||
root := buildQueueWithParents("root", "", nil, nil) | |||
root1 := buildQueueWithParents("root", "", nil, api.BuildResourceList("16", "16Gi")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -468,6 +469,8 @@ | |||
pg13 := util.BuildPodGroup("pg13", "ns1", "q91", 1, nil, schedulingv1beta1.PodGroupInqueue) | |||
// pod | |||
p13 := util.BuildPod("ns1", "p13", "", corev1.PodPending, api.BuildResourceList("2", "2Gi", []api.ScalarResource{{Name: "nvidia.com/gpu", Value: "1"}}...), "pg13", make(map[string]string), make(map[string]string)) | |||
// queue | |||
queue10 := buildQueueWithParents("q10", "root", nil, api.BuildResourceList("10", "4Gi", []api.ScalarResource{}...)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check my latest review comments and gemini's comments, thanks
go.sum
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think go.mod and go.sum should not change in your pr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: leona.hou <[email protected]>
@@ -774,8 +780,14 @@ func (cp *capacityPlugin) checkHierarchicalQueue(attr *queueAttr) error { | |||
} | |||
|
|||
if attr.name == cp.rootQueue { | |||
attr.guarantee = totalGuarantee | |||
cp.totalGuarantee = totalGuarantee | |||
if attr.guarantee.IsEmpty() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The root queue is created by yourself or auto created?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto created
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then its guarantee and capability fields will be updated once created, so seems this can not take effect.
/lgtm |
Fixes #4350
Result:
UT- when I use old code, I got:

UT- when I use new code, I got:
