feat: move gang scheduling metadata to typed protobuf fields #4547
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What type of PR is this?
Enhancement / Bug fix
What this PR does / why we need it
Moves gang scheduling metadata from string annotations to typed protobuf fields, improving type safety and clarifying the scheduler<>executor contract.
New types
Added three new types to replace annotation-based gang scheduling:
GangInforepresents the user's gang configuration (gang ID, cardinality, and node uniformity label name). Users submit this in the newSubmitJob.gangfield.GangPlacementextendsGangInfowith the scheduler's placement decision. After the scheduler decides where to place the gang, it adds thenode_uniformity_label_valuefield. This complete placement info gets sent to executors.SchedulingMetadatais a container message inJobRunLeasethat holds theGangPlacement. This gives us room to add other scheduling decisions in the future.Server
The server now handles gang metadata in two phases:
SchedulingMetadata. ThebuildSchedulingMetadata()function handles both cases, so old and new clients both work.node_uniformity_label_valuebased on where it decides to place the gang. It then sends the completeSchedulingMetadatato the executor viaJobRunLease.Executor
Executor now receives fully-populated
SchedulingMetadatafrom the scheduler and uses it directly to build environment variables for the Armada jobs(pods).Example