You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Sep 12, 2023. It is now read-only.
When enable-gang-scheduler=true, tf-operator will create CRD podgroup to permit gang scheduler volcano to allocate the pods. but when createing pod in func SyncPodGroup:
Only pod infos and minMember is set in podgroup, which resulting to function missing, as well as unpredicatable bugs during allocation.
For example, since no minResources field is filled in podgroup, gang scheduler volcano cannot diff tfjobs from bestEffort jobs as both of the two jobs owns nil minResources, causing all tfjobs can be inqueue and action enqueue , reserve lose effort.
So in my opinion, we need to supplement more infos about tfjob into podgroup, such as minMember, queue as well as other fields, so as to make sure gang scheduler workers correctly.
The text was updated successfully, but these errors were encountered:
When
enable-gang-scheduler=true
, tf-operator will create CRDpodgroup
to permit gang scheduler volcano to allocate the pods. but when createing pod in funcSyncPodGroup
:common/pkg/controller.v1/common/job_controller.go
Line 211 in 3fbe0ce
Only pod infos and
minMember
is set in podgroup, which resulting to function missing, as well as unpredicatable bugs during allocation.For example, since no
minResources
field is filled in podgroup, gang scheduler volcano cannot diff tfjobs from bestEffort jobs as both of the two jobs ownsnil minResources
, causing all tfjobs can beinqueue
and actionenqueue
,reserve
lose effort.So in my opinion, we need to supplement more infos about tfjob into podgroup, such as
minMember
,queue
as well as other fields, so as to make sure gang scheduler workers correctly.The text was updated successfully, but these errors were encountered: