-
Notifications
You must be signed in to change notification settings - Fork 641
feat: vllm data parallelism multi-node support in operator #3595
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
076f85f
to
fa5f9ee
Compare
WalkthroughCentralizes multinode flag injection into a shared utility; SGLang delegates to it. VLLM adds conditional Ray vs. data-parallel injection, world-size/GPU parsing, and a new UpdateContainer parameter for leader resolution. CRDs remove dynamoComponent/dynamoTag and dynamoGraph. Tests and docs updated for the new flows. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Op as Operator
participant V as VLLMBackend
participant M as MultinodeDeployer
participant U as Utils (injection)
Op->>V: UpdateContainer(container, nodes, role, comp, svcName, multinodeDeployer)
V->>V: getWorldSize(), getContainerGPUs(), getFlagValue()
alt Ray-based distributed launch
V->>M: ResolveLeaderHostname()
M-->>V: headAddress
V->>U: injectFlagsIntoContainerCommand(... Ray flags ..., needsShell=true)
U-->>V: container updated
else Data-parallel launch
V->>V: compute start-rank, dp size/local, RPC port
V->>U: injectFlagsIntoContainerCommand(... data-parallel flags ..., needsShell=maybe)
U-->>V: container updated
else No injection needed
V-->>V: leave args unchanged
end
V-->>Op: updated container
sequenceDiagram
autonumber
participant Op as Operator
participant S as SGLangBackend
participant U as Utils (injection)
Op->>S: UpdateContainer(container, ...)
S->>U: injectFlagsIntoContainerCommand(container, flags, needsShell, "sglang")
U-->>S: container updated
S-->>Op: updated container
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
Pre-merge checks❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
WalkthroughCRD schemas were updated to remove certain public fields and adjust descriptions. Operator logic refactors introduce shared flag-injection utilities, simplify SGLang handling, and add VLLM multinode launch paths (Ray-based and data-parallel), with corresponding tests. Documentation was updated to explain multinode modes, probe behavior, and ports. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant User as Deployment Spec
participant Operator as Operator (VLLM Backend)
participant Node as Pod/Container
participant Utils as Flag Injection Utils
Note over Operator: UpdateContainer for VLLM
User->>Operator: Provide args, role, resources
Operator->>Operator: Compute worldSize, GPUsPerNode
alt Ray-based multinode
Operator->>Node: Leader: prepend "ray start --head" to command
Operator->>Node: Worker: prepend "ray start --address=<leader>"
else Data-parallel multinode
Operator->>Utils: injectFlagsIntoContainerCommand(..., "--dp-address ... --dp-size-local ... --rpc-port 13445 --start-rank <n>", needsShell)
Utils-->>Operator: Container updated
else Single-node / no-op
Operator->>Operator: Log no injected flags
end
Note over Node: Probes adjusted per role/mode (as documented)
sequenceDiagram
autonumber
participant Backend as Backend (SGLang/VLLM)
participant Utils as injectFlagsIntoContainerCommand
participant K8s as corev1.Container
Backend->>Utils: Inject flags into container (flags, needsShell)
alt Python entrypoint
alt needsShell
Utils->>K8s: Wrap with ["sh","-c"] and exec-preserving string
else direct
Utils->>K8s: Append split flags to Args
end
else Non-Python command
Utils->>K8s: Regex-inject flags into first python ... sglang occurrence within Args
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
Pre-merge checks❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (9)
deploy/cloud/operator/internal/dynamo/backend_sglang.go (1)
18-27
: Remove unused helper (and import) to avoid dead code.isPythonCommand is no longer referenced; drop it and the regexp import to keep the file lean.
-import ( - "fmt" - "regexp" +import ( + "fmt" "github.com/ai-dynamo/dynamo/deploy/cloud/operator/api/v1alpha1" corev1 "k8s.io/api/core/v1" "sigs.k8s.io/controller-runtime/pkg/log" ) @@ -// isPythonCommand checks if the command is a Python interpreter -func isPythonCommand(cmd string) bool { - if cmd == "python" || cmd == "python3" { - return true - } - // Match python with version numbers like python3.11, python2.7, etc. - // Also support absolute paths like /usr/bin/python3.8, /opt/python/bin/python3.11 - matched, _ := regexp.MatchString(`^(.*/)?(python\d*(\.\d+)*)$`, cmd) - return matched -}docs/kubernetes/multinode-deployment.md (2)
191-202
: Use headings instead of emphasis for section titles (MD036).Replace bolded mode titles with proper subheadings for readability and lint compliance.
-**1. Ray-Based Mode (Tensor/Pipeline Parallelism across nodes)** +#### Ray-Based Mode (Tensor/Pipeline Parallelism across nodes)
203-216
: Same here: convert emphasized title to a heading (MD036).Switch to a subheading for Data Parallel Mode.
-**2. Data Parallel Mode (Multiple model instances across nodes)** +#### Data Parallel Mode (Multiple model instances across nodes)deploy/cloud/operator/internal/dynamo/backend_vllm_test.go (3)
93-99
: Fix tautological assertion; snapshot args before calling UpdateContainer.Current check compares the mutated slice to itself and will always pass.
- // Call UpdateContainer - backend.UpdateContainer(tt.initialContainer, tt.numberOfNodes, tt.role, tt.component, "test-service", tt.multinodeDeployer) - - if tt.expectNotModified { - // Args should not have changed - g.Expect(tt.initialContainer.Args).To(gomega.Equal(tt.initialContainer.Args)) + // Snapshot args before mutation + priorArgs := append([]string{}, tt.initialContainer.Args...) + backend.UpdateContainer(tt.initialContainer, tt.numberOfNodes, tt.role, tt.component, "test-service", tt.multinodeDeployer) + if tt.expectNotModified { + g.Expect(tt.initialContainer.Args).To(gomega.Equal(priorArgs))
37-45
: Leader probes should not be asserted as removed; initialize probes to verify behavior.Leader probes remain active per design; the test should not mark expectProbesRemoved for leader. Also set non-nil probes to actually test removal for workers.
- name: "multinode leader prepends ray start --head", + name: "multinode leader prepends ray start --head (probes unchanged)", @@ - expectProbesRemoved: true, + expectProbesRemoved: false,Follow-up: Initialize Liveness/Readiness/Startup probes in worker cases to validate removal.
252-264
: Add a test for non-divisible GPUs/world size (rounding semantics).When GPUs_per_node is not divisible by TP×PP, dataParallelSizeLocal truncates. Add a case to document expected behavior (or enforce divisibility).
deploy/cloud/operator/internal/dynamo/backend_vllm.go (3)
31-36
: Fix misleading comment (leader probes aren’t removed).The code removes probes only for workers; update the comment to avoid confusion.
- // Remove probes for multinode worker and leader + // Remove probes for multinode workers
90-99
: Validate GPU/parallelism divisibility before DP injection.dataParallelSizeLocal uses integer division; if GPUs_per_node is not divisible by TP×PP, local size truncates and can underutilize GPUs. Warn or fail fast.
- dataParallelSizeLocal := getContainerGPUs(container) / getWorldSize(container) + containerGPUs := getContainerGPUs(container) + worldSize := getWorldSize(container) + if worldSize == 0 { + worldSize = 1 + } + if containerGPUs%worldSize != 0 { + log.Log.WithName("vllm-backend").Info("GPUs_per_node not divisible by TP×PP; truncating data-parallel workers per node", + "gpusPerNode", containerGPUs, "worldSize", worldSize) + } + dataParallelSizeLocal := containerGPUs / worldSize
131-143
: Consider supporting --flag=value style parsing.getFlagValue only handles '--flag value'. Many CLIs accept '--flag=value'. Supporting both improves robustness.
for i, arg := range container.Args { - if arg == flag && (i+1 < len(container.Args)) { + if arg == flag && (i+1 < len(container.Args)) { flagValue, err := strconv.ParseInt(container.Args[i+1], 10, 64) if err != nil { continue } return flagValue + } + // Support --flag=value form + if strings.HasPrefix(arg, flag+"=") { + val := strings.TrimPrefix(arg, flag+"=") + if v, err := strconv.ParseInt(val, 10, 64); err == nil { + return v + } } }
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml
(0 hunks)deploy/cloud/helm/crds/templates/nvidia.com_dynamographdeployments.yaml
(2 hunks)deploy/cloud/operator/config/crd/bases/nvidia.com_dynamocomponentdeployments.yaml
(0 hunks)deploy/cloud/operator/config/crd/bases/nvidia.com_dynamographdeployments.yaml
(2 hunks)deploy/cloud/operator/docs/footer.md
(2 hunks)deploy/cloud/operator/internal/dynamo/backend_sglang.go
(1 hunks)deploy/cloud/operator/internal/dynamo/backend_vllm.go
(2 hunks)deploy/cloud/operator/internal/dynamo/backend_vllm_test.go
(9 hunks)deploy/cloud/operator/internal/dynamo/utils.go
(1 hunks)docs/kubernetes/multinode-deployment.md
(1 hunks)
💤 Files with no reviewable changes (2)
- deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml
- deploy/cloud/operator/config/crd/bases/nvidia.com_dynamocomponentdeployments.yaml
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-07-25T22:34:11.384Z
Learnt from: nnshah1
PR: ai-dynamo/dynamo#2124
File: components/backends/vllm/deploy/disagg.yaml:54-60
Timestamp: 2025-07-25T22:34:11.384Z
Learning: In vLLM worker deployments, startup probes (with longer periods and higher failure thresholds like periodSeconds: 10, failureThreshold: 60) are used to handle the slow model loading startup phase, while liveness probes are intentionally kept aggressive (periodSeconds: 5, failureThreshold: 1) for quick failure detection once the worker is operational. This pattern separates startup concerns from operational health monitoring in GPU-heavy workloads.
Applied to files:
deploy/cloud/operator/docs/footer.md
🧬 Code graph analysis (3)
deploy/cloud/operator/internal/dynamo/backend_vllm.go (2)
deploy/cloud/operator/internal/dynamo/graph.go (4)
Role
(559-559)MultinodeDeployer
(615-619)RoleLeader
(562-562)RoleWorker
(563-563)deploy/cloud/operator/internal/consts/consts.go (1)
KubeResourceGPUNvidia
(47-47)
deploy/cloud/operator/internal/dynamo/utils.go (1)
benchmarks/profiler/utils/config.py (1)
Container
(49-54)
deploy/cloud/operator/internal/dynamo/backend_vllm_test.go (3)
deploy/cloud/operator/internal/dynamo/backend_vllm.go (1)
VLLMPort
(15-15)deploy/cloud/operator/internal/dynamo/graph.go (5)
Role
(559-559)MultinodeDeployer
(615-619)RoleMain
(564-564)RoleLeader
(562-562)RoleWorker
(563-563)deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go (1)
DynamoComponentDeploymentSharedSpec
(48-106)
🪛 markdownlint-cli2 (0.18.1)
docs/kubernetes/multinode-deployment.md
191-191: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
203-203: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Validate PR title and add label
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (7)
deploy/cloud/helm/crds/templates/nvidia.com_dynamographdeployments.yaml (2)
69-70
: Deployment-scoped env description reads correctly.
The revised wording matches the deployment-level semantics and clarifies override scope.
10386-10387
: Concise services description LGTM.
The new phrasing is clear without altering schema semantics.deploy/cloud/operator/config/crd/bases/nvidia.com_dynamographdeployments.yaml (2)
69-70
: Documentation wording is consistent.The updated description correctly reflects deployment-wide env scope, matching the rest of the spec.
10386-10387
: Services description looks good.Wording is clearer without altering behavior.
deploy/cloud/operator/docs/footer.md (2)
98-107
: Mode selection conditions look correct (world_size vs GPUs).Docs align with code paths (Ray when TP×PP > GPUs; DP when TP×PP×DP > GPUs). LGTM.
204-206
: Confirm whether port 13445 needs explicit containerPort/service exposure.If DP coordination requires inbound connections to the leader, ensure:
- container.ports includes 13445 (optional for Pod IP access, required if exposed via a Service).
- Any Service definitions include this port when needed.
deploy/cloud/operator/internal/dynamo/backend_sglang.go (1)
62-62
: Good refactor: centralizing flag injection.Using injectFlagsIntoContainerCommand removes duplication and keeps behavior consistent across backends.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
deploy/cloud/operator/internal/dynamo/utils.go (1)
31-61
: Data-parallel injection still misses/usr/bin/env python
launchesWhen the image uses an env-wrapper (
command: ["/usr/bin/env", "python3"]
,args: ["-m", "vllm.entrypoints.api_server", …]
), the first branch rejects it (isPythonCommand("/usr/bin/env")
→ false) and the fallback scans each arg in isolation. Because the interpreter and module tokens live in different slice entries, the regex never fires and we drop all distributed flags—which breaks the multinode DP flow for that common launch pattern, despite the earlier review note on the same gap.Please treat env/python pairs as Python (or join command+args before attempting injection) so
/usr/bin/env python* …
receives the flags.
🧹 Nitpick comments (1)
deploy/cloud/operator/docs/footer.md (1)
97-106
: Simplify redundant probe behavior documentation.Both Ray-Based Mode and Data Parallel Mode specify identical probe behavior (worker nodes have all probes removed, leader nodes keep all probes active). Consider consolidating this into a single statement to reduce redundancy and improve readability.
Apply this diff to simplify the documentation:
#### VLLM Backend -The operator automatically selects between two deployment modes based on parallelism configuration: - -**Ray-Based Mode** (when `world_size > GPUs_per_node`): -- **Worker nodes**: All probes (liveness, readiness, startup) are removed -- **Leader nodes**: All probes remain active - -**Data Parallel Mode** (when `world_size × data_parallel_size > GPUs_per_node`): +The operator automatically selects between Ray-Based Mode (when `world_size > GPUs_per_node`) and Data Parallel Mode (when `world_size × data_parallel_size > GPUs_per_node`) based on parallelism configuration. For both modes: + - **Worker nodes**: All probes (liveness, readiness, startup) are removed - **Leader nodes**: All probes remain active
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml
(0 hunks)deploy/cloud/helm/crds/templates/nvidia.com_dynamographdeployments.yaml
(2 hunks)deploy/cloud/operator/config/crd/bases/nvidia.com_dynamocomponentdeployments.yaml
(0 hunks)deploy/cloud/operator/config/crd/bases/nvidia.com_dynamographdeployments.yaml
(2 hunks)deploy/cloud/operator/docs/footer.md
(2 hunks)deploy/cloud/operator/internal/dynamo/backend_sglang.go
(1 hunks)deploy/cloud/operator/internal/dynamo/backend_vllm.go
(2 hunks)deploy/cloud/operator/internal/dynamo/backend_vllm_test.go
(9 hunks)deploy/cloud/operator/internal/dynamo/utils.go
(1 hunks)docs/kubernetes/multinode-deployment.md
(1 hunks)
💤 Files with no reviewable changes (2)
- deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml
- deploy/cloud/operator/config/crd/bases/nvidia.com_dynamocomponentdeployments.yaml
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-07-25T22:34:11.384Z
Learnt from: nnshah1
PR: ai-dynamo/dynamo#2124
File: components/backends/vllm/deploy/disagg.yaml:54-60
Timestamp: 2025-07-25T22:34:11.384Z
Learning: In vLLM worker deployments, startup probes (with longer periods and higher failure thresholds like periodSeconds: 10, failureThreshold: 60) are used to handle the slow model loading startup phase, while liveness probes are intentionally kept aggressive (periodSeconds: 5, failureThreshold: 1) for quick failure detection once the worker is operational. This pattern separates startup concerns from operational health monitoring in GPU-heavy workloads.
Applied to files:
deploy/cloud/operator/docs/footer.md
🧬 Code graph analysis (3)
deploy/cloud/operator/internal/dynamo/backend_vllm.go (2)
deploy/cloud/operator/internal/dynamo/graph.go (4)
Role
(559-559)MultinodeDeployer
(615-619)RoleLeader
(562-562)RoleWorker
(563-563)deploy/cloud/operator/internal/consts/consts.go (1)
KubeResourceGPUNvidia
(47-47)
deploy/cloud/operator/internal/dynamo/backend_vllm_test.go (5)
deploy/cloud/operator/internal/dynamo/backend_vllm.go (1)
VLLMPort
(15-15)deploy/cloud/operator/internal/dynamo/graph.go (6)
Role
(559-559)MultinodeDeployer
(615-619)RoleMain
(564-564)Resources
(82-87)RoleLeader
(562-562)RoleWorker
(563-563)deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go (1)
DynamoComponentDeploymentSharedSpec
(48-106)deploy/cloud/operator/internal/dynamo/grove.go (1)
GroveMultinodeDeployer
(23-25)deploy/cloud/operator/internal/dynamo/lws.go (1)
LWSMultinodeDeployer
(5-7)
deploy/cloud/operator/internal/dynamo/utils.go (1)
benchmarks/profiler/utils/config.py (1)
Container
(49-54)
🪛 markdownlint-cli2 (0.18.1)
docs/kubernetes/multinode-deployment.md
191-191: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
203-203: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (5)
deploy/cloud/operator/config/crd/bases/nvidia.com_dynamographdeployments.yaml (2)
69-70
: Env description clarification looks goodThe updated wording cleanly scopes shared env vars versus service overrides. Thanks for tightening the docs.
10386-10387
: Service description update is clearThe refreshed text plainly states the field’s role in the deployment spec. No concerns here.
deploy/cloud/operator/docs/footer.md (1)
204-205
: Documentation aligns with PR objectives.The addition of the Data Parallel RPC Port (13445) and clarification of the Ray Head Port usage correctly reflects the new multinode data-parallel support for VLLM.
deploy/cloud/operator/internal/dynamo/backend_sglang.go (2)
62-62
: Refactoring improves maintainability.Delegating flag injection to the shared
injectFlagsIntoContainerCommand
utility aligns with the PR objective of centralizing multinode shell interpolation logic and reduces code duplication.
19-27
: RetainisPythonCommand
; it’s still in use: calls appear inbackend_trtllm.go
(line 112),utils.go
(line 31) and are exercised bybackend_sglang_test.go
.Likely an incorrect or invalid review comment.
backend.UpdateContainer(tt.initialContainer, tt.numberOfNodes, tt.role, &v1alpha1.DynamoComponentDeploymentSharedSpec{}, "test-service", tt.multinodeDeployer) | ||
|
||
if !reflect.DeepEqual(tt.initialContainer.Args, tt.expectedArgs) { | ||
t.Errorf("UpdateContainer() args = %v, want %v", tt.initialContainer.Args, tt.expectedArgs) | ||
} | ||
|
||
expectedCommand := tt.initialContainer.Command | ||
if !reflect.DeepEqual(tt.initialContainer.Command, expectedCommand) { | ||
t.Errorf("UpdateContainer() should preserve shell command, got: %v, want: %v", tt.initialContainer.Command, expectedCommand) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shell-command equality check is a no-op
You snapshot expectedCommand
after calling UpdateContainer
, so it aliases the mutated slice; the subsequent comparison will never fail. Copy the slice before invoking the backend to actually assert command preservation.
- backend.UpdateContainer(tt.initialContainer, tt.numberOfNodes, tt.role, &v1alpha1.DynamoComponentDeploymentSharedSpec{}, "test-service", tt.multinodeDeployer)
-
- if !reflect.DeepEqual(tt.initialContainer.Args, tt.expectedArgs) {
+ originalCommand := append([]string{}, tt.initialContainer.Command...)
+
+ backend.UpdateContainer(tt.initialContainer, tt.numberOfNodes, tt.role, &v1alpha1.DynamoComponentDeploymentSharedSpec{}, "test-service", tt.multinodeDeployer)
+
+ if !reflect.DeepEqual(tt.initialContainer.Args, tt.expectedArgs) {
t.Errorf("UpdateContainer() args = %v, want %v", tt.initialContainer.Args, tt.expectedArgs)
}
- expectedCommand := tt.initialContainer.Command
- if !reflect.DeepEqual(tt.initialContainer.Command, expectedCommand) {
- t.Errorf("UpdateContainer() should preserve shell command, got: %v, want: %v", tt.initialContainer.Command, expectedCommand)
+ if !reflect.DeepEqual(tt.initialContainer.Command, originalCommand) {
+ t.Errorf("UpdateContainer() should preserve shell command, got: %v, want: %v", tt.initialContainer.Command, originalCommand)
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
backend.UpdateContainer(tt.initialContainer, tt.numberOfNodes, tt.role, &v1alpha1.DynamoComponentDeploymentSharedSpec{}, "test-service", tt.multinodeDeployer) | |
if !reflect.DeepEqual(tt.initialContainer.Args, tt.expectedArgs) { | |
t.Errorf("UpdateContainer() args = %v, want %v", tt.initialContainer.Args, tt.expectedArgs) | |
} | |
expectedCommand := tt.initialContainer.Command | |
if !reflect.DeepEqual(tt.initialContainer.Command, expectedCommand) { | |
t.Errorf("UpdateContainer() should preserve shell command, got: %v, want: %v", tt.initialContainer.Command, expectedCommand) | |
} | |
// Snapshot the original command before calling UpdateContainer | |
originalCommand := append([]string{}, tt.initialContainer.Command...) | |
backend.UpdateContainer(tt.initialContainer, tt.numberOfNodes, tt.role, &v1alpha1.DynamoComponentDeploymentSharedSpec{}, "test-service", tt.multinodeDeployer) | |
if !reflect.DeepEqual(tt.initialContainer.Args, tt.expectedArgs) { | |
t.Errorf("UpdateContainer() args = %v, want %v", tt.initialContainer.Args, tt.expectedArgs) | |
} | |
// Verify that the command slice was preserved | |
if !reflect.DeepEqual(tt.initialContainer.Command, originalCommand) { | |
t.Errorf("UpdateContainer() should preserve shell command, got: %v, want: %v", tt.initialContainer.Command, originalCommand) | |
} |
🤖 Prompt for AI Agents
In deploy/cloud/operator/internal/dynamo/backend_vllm_test.go around lines 175
to 184, the test captures expectedCommand after calling UpdateContainer which
aliases the mutated slice, making the comparison a no-op; copy the initial
container.Command slice (e.g., make a new slice and copy contents) before
invoking backend.UpdateContainer, then call UpdateContainer and compare the
post-call container.Command to the copied expected slice to properly assert
command preservation.
Overview:
Previously, only support for vllm TP multi-node existed within the operator using Ray. This PR adds support for the case where data parallelism ranks span across multiple nodes.
Details:
deploy/cloud/operator/internal/dynamo/backend_sglang.go
: moves logic for handling shell interpolation into fileutils.go
so that it can also be leveraged bybackend_vllm.go
.deploy/cloud/operator/internal/dynamo/backend_vllm.go
: if world size (TP * PP) > container GPU count - use Ray launch strategy. if world size across DP ranks (TP * PP * DP) > container GPU count - need to inject data parallel flags for multi-node coordination.Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit
New Features
Documentation
Refactor
Tests