Skip to content

Conversation

tmonty12
Copy link
Contributor

@tmonty12 tmonty12 commented Oct 13, 2025

Overview:

Previously, only support for vllm TP multi-node existed within the operator using Ray. This PR adds support for the case where data parallelism ranks span across multiple nodes.

Details:

  • deploy/cloud/operator/internal/dynamo/backend_sglang.go: moves logic for handling shell interpolation into file utils.go so that it can also be leveraged by backend_vllm.go.
  • deploy/cloud/operator/internal/dynamo/backend_vllm.go: if world size (TP * PP) > container GPU count - use Ray launch strategy. if world size across DP ranks (TP * PP * DP) > container GPU count - need to inject data parallel flags for multi-node coordination.

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Summary by CodeRabbit

  • New Features

    • vLLM multinode support enhanced with automatic mode detection (Ray-based vs data-parallel) and correct leader/worker behavior.
  • Documentation

    • Expanded multinode deployment guidance, mode explanations, probe behavior, and data-parallel RPC port details.
  • Refactor

    • Removed deprecated CRD fields from component and graph deployment schemas.
    • Unified and simplified container flag injection across backends for more consistent multinode launches.
  • Tests

    • Updated tests to reflect new multinode argument handling and shell-command scenarios.

@tmonty12 tmonty12 requested a review from a team as a code owner October 13, 2025 21:46
@github-actions github-actions bot added the feat label Oct 13, 2025
@tmonty12 tmonty12 force-pushed the tmonty12/vllm-multinode-dp-operator-support branch from 076f85f to fa5f9ee Compare October 13, 2025 21:49
@tmonty12 tmonty12 changed the title feat: vllm data parallelism multi-node support feat: vllm data parallelism multi-node support in operator Oct 13, 2025
Copy link
Contributor

coderabbitai bot commented Oct 13, 2025

Walkthrough

Centralizes multinode flag injection into a shared utility; SGLang delegates to it. VLLM adds conditional Ray vs. data-parallel injection, world-size/GPU parsing, and a new UpdateContainer parameter for leader resolution. CRDs remove dynamoComponent/dynamoTag and dynamoGraph. Tests and docs updated for the new flows.

Changes

Cohort / File(s) Summary
SGLang backend refactor
deploy/cloud/operator/internal/dynamo/backend_sglang.go
Replaced inline multinode flag injection with injectFlagsIntoContainerCommand; removed legacy python-specific injection and commented code; UpdatePodSpec made no-op; trimmed imports.
VLLM backend multinode overhaul
deploy/cloud/operator/internal/dynamo/backend_vllm.go
Added mode-aware injection selecting Ray-based or data-parallel launches; added arg expansion, leader hostname resolution, GPU/world-size and rank helpers; introduced new constants/flags and updated UpdateContainer signature to accept multinodeDeployer.
VLLM tests updated
deploy/cloud/operator/internal/dynamo/backend_vllm_test.go
Refactored tests to use single corev1.Container as input; added shell-command and multinode injection coverage; adjusted imports and resource/GPU test setup; updated expectations to compare container.Args.
Shared injection utilities
deploy/cloud/operator/internal/dynamo/utils.go
Added injectFlagsIntoContainerCommand and injectFlagsIntoPythonCommand (regex-based) to inject flags into Python or shell-wrapped commands; supports shell wrapping with sh -c + exec when needed and per-arg injection attempts for non-direct Python commands.
CRD removals and schemas
deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml, deploy/cloud/operator/config/crd/bases/nvidia.com_dynamocomponentdeployments.yaml, deploy/cloud/helm/crds/templates/nvidia.com_dynamographdeployments.yaml, deploy/cloud/operator/config/crd/bases/nvidia.com_dynamographdeployments.yaml
Removed public fields: dynamoComponent and dynamoTag from DynamoComponentDeployment CRD; removed dynamoGraph from DynamoGraphDeployment CRD; updated several field descriptions (envs/services) to clarify deployment semantics.
Docs updated
deploy/cloud/operator/docs/footer.md, docs/kubernetes/multinode-deployment.md
Documented VLLM deployment modes (Ray-Based vs Data Parallel), probe behavior, injected flags (including data-parallel RPC port 13445), and operator injection behavior; revised mode-based guidance.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Op as Operator
  participant V as VLLMBackend
  participant M as MultinodeDeployer
  participant U as Utils (injection)

  Op->>V: UpdateContainer(container, nodes, role, comp, svcName, multinodeDeployer)
  V->>V: getWorldSize(), getContainerGPUs(), getFlagValue()
  alt Ray-based distributed launch
    V->>M: ResolveLeaderHostname()
    M-->>V: headAddress
    V->>U: injectFlagsIntoContainerCommand(... Ray flags ..., needsShell=true)
    U-->>V: container updated
  else Data-parallel launch
    V->>V: compute start-rank, dp size/local, RPC port
    V->>U: injectFlagsIntoContainerCommand(... data-parallel flags ..., needsShell=maybe)
    U-->>V: container updated
  else No injection needed
    V-->>V: leave args unchanged
  end
  V-->>Op: updated container
Loading
sequenceDiagram
  autonumber
  participant Op as Operator
  participant S as SGLangBackend
  participant U as Utils (injection)

  Op->>S: UpdateContainer(container, ...)
  S->>U: injectFlagsIntoContainerCommand(container, flags, needsShell, "sglang")
  U-->>S: container updated
  S-->>Op: updated container
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

A whisk of flags, a hop through nodes,
I stitch the ranks on training roads.
Ray or data-parallel I choose with care,
I wrap the shell and place flags there.
Small paws, big changes—deployments bloom! 🐇⚙️

Pre-merge checks

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The description follows the repository’s template by providing Overview, Details, and Related Issues sections, but leaves the “Where should the reviewer start?” section empty with only the placeholder comment instead of listing the specific files or code areas to examine. This omission may hinder reviewers’ ability to focus on critical parts of the changeset. Please populate the “Where should the reviewer start?” section with the relevant file paths or code areas that should be prioritized for review, replacing the placeholder comment with concrete guidance.
Docstring Coverage ⚠️ Warning Docstring coverage is 35.71% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title Check ✅ Passed The title concisely describes the primary feature addition—vLLM data-parallel multi-node support in the operator—using clear and specific terminology aligned with the changeset. It highlights the new capability without including extraneous details, making it easily understandable at a glance.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

coderabbitai bot commented Oct 13, 2025

Walkthrough

CRD schemas were updated to remove certain public fields and adjust descriptions. Operator logic refactors introduce shared flag-injection utilities, simplify SGLang handling, and add VLLM multinode launch paths (Ray-based and data-parallel), with corresponding tests. Documentation was updated to explain multinode modes, probe behavior, and ports.

Changes

Cohort / File(s) Summary of Changes
CRD: DynamoComponentDeployment (templates + bases)
deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml, deploy/cloud/operator/config/crd/bases/nvidia.com_dynamocomponentdeployments.yaml
Removed spec fields: dynamoComponent, dynamoTag. No new fields added.
CRD: DynamoGraphDeployment (templates + bases)
deploy/cloud/helm/crds/templates/nvidia.com_dynamographdeployments.yaml, deploy/cloud/operator/config/crd/bases/nvidia.com_dynamographdeployments.yaml
Removed spec.backendFramework.dynamoGraph. Updated descriptions: envs scope wording and simplified services description.
Operator Runtime: SGLang refactor + shared utils
deploy/cloud/operator/internal/dynamo/backend_sglang.go, deploy/cloud/operator/internal/dynamo/utils.go
Moved flag-injection logic into new utilities: injectFlagsIntoContainerCommand, injectFlagsIntoPythonCommand. backend_sglang.go now delegates; UpdatePodSpec becomes no-op.
Operator Runtime: VLLM multinode logic + tests
deploy/cloud/operator/internal/dynamo/backend_vllm.go, deploy/cloud/operator/internal/dynamo/backend_vllm_test.go
Added Ray-based and data-parallel multinode handling, GPU/count and world-size helpers, and compilation cache env support. Updated tests to new arg initialization and multinode scenarios.
Docs: Multinode behavior and ports
deploy/cloud/operator/docs/footer.md, docs/kubernetes/multinode-deployment.md
Documented Ray-based vs data-parallel modes, probe behavior, data-parallel RPC port (13445), and clarified Ray head port scope.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant User as Deployment Spec
    participant Operator as Operator (VLLM Backend)
    participant Node as Pod/Container
    participant Utils as Flag Injection Utils

    Note over Operator: UpdateContainer for VLLM
    User->>Operator: Provide args, role, resources
    Operator->>Operator: Compute worldSize, GPUsPerNode
    alt Ray-based multinode
        Operator->>Node: Leader: prepend "ray start --head" to command
        Operator->>Node: Worker: prepend "ray start --address=<leader>"
    else Data-parallel multinode
        Operator->>Utils: injectFlagsIntoContainerCommand(..., "--dp-address ... --dp-size-local ... --rpc-port 13445 --start-rank <n>", needsShell)
        Utils-->>Operator: Container updated
    else Single-node / no-op
        Operator->>Operator: Log no injected flags
    end
    Note over Node: Probes adjusted per role/mode (as documented)
Loading
sequenceDiagram
    autonumber
    participant Backend as Backend (SGLang/VLLM)
    participant Utils as injectFlagsIntoContainerCommand
    participant K8s as corev1.Container

    Backend->>Utils: Inject flags into container (flags, needsShell)
    alt Python entrypoint
        alt needsShell
            Utils->>K8s: Wrap with ["sh","-c"] and exec-preserving string
        else direct
            Utils->>K8s: Append split flags to Args
        end
    else Non-Python command
        Utils->>K8s: Regex-inject flags into first python ... sglang occurrence within Args
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

I thump my paw at dawn’s new art,
Fields trimmed from CRDs, a cleaner start.
Ray or ranks, I hop the lanes,
Flags get tucked in tidy chains.
Probes stand guard, ports align—
Multinode meadows, brisk and fine.
Cache burrows ready—ship on time! 🐇✨

Pre-merge checks

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The description follows the required template by including Overview, Details, and Related Issues sections but fails to provide any content under "Where should the reviewer start?", and also retains a placeholder issue number instead of a real reference, leaving key reviewer guidance and issue linking incomplete. Please populate the "Where should the reviewer start?" section with specific file paths and key areas to inspect, and replace the placeholder issue number with the actual GitHub issue reference to fully satisfy the template requirements.
Docstring Coverage ⚠️ Warning Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title Check ✅ Passed The title succinctly describes the primary feature added—enabling vLLM data parallel multi-node support in the operator—which directly matches the main change objectives without extra detail or ambiguity.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (9)
deploy/cloud/operator/internal/dynamo/backend_sglang.go (1)

18-27: Remove unused helper (and import) to avoid dead code.

isPythonCommand is no longer referenced; drop it and the regexp import to keep the file lean.

-import (
-	"fmt"
-	"regexp"
+import (
+	"fmt"
 	"github.com/ai-dynamo/dynamo/deploy/cloud/operator/api/v1alpha1"
 	corev1 "k8s.io/api/core/v1"
 	"sigs.k8s.io/controller-runtime/pkg/log"
 )
@@
-// isPythonCommand checks if the command is a Python interpreter
-func isPythonCommand(cmd string) bool {
-	if cmd == "python" || cmd == "python3" {
-		return true
-	}
-	// Match python with version numbers like python3.11, python2.7, etc.
-	// Also support absolute paths like /usr/bin/python3.8, /opt/python/bin/python3.11
-	matched, _ := regexp.MatchString(`^(.*/)?(python\d*(\.\d+)*)$`, cmd)
-	return matched
-}
docs/kubernetes/multinode-deployment.md (2)

191-202: Use headings instead of emphasis for section titles (MD036).

Replace bolded mode titles with proper subheadings for readability and lint compliance.

-**1. Ray-Based Mode (Tensor/Pipeline Parallelism across nodes)**
+#### Ray-Based Mode (Tensor/Pipeline Parallelism across nodes)

203-216: Same here: convert emphasized title to a heading (MD036).

Switch to a subheading for Data Parallel Mode.

-**2. Data Parallel Mode (Multiple model instances across nodes)**
+#### Data Parallel Mode (Multiple model instances across nodes)
deploy/cloud/operator/internal/dynamo/backend_vllm_test.go (3)

93-99: Fix tautological assertion; snapshot args before calling UpdateContainer.

Current check compares the mutated slice to itself and will always pass.

-			// Call UpdateContainer
-			backend.UpdateContainer(tt.initialContainer, tt.numberOfNodes, tt.role, tt.component, "test-service", tt.multinodeDeployer)
-
-			if tt.expectNotModified {
-				// Args should not have changed
-				g.Expect(tt.initialContainer.Args).To(gomega.Equal(tt.initialContainer.Args))
+			// Snapshot args before mutation
+			priorArgs := append([]string{}, tt.initialContainer.Args...)
+			backend.UpdateContainer(tt.initialContainer, tt.numberOfNodes, tt.role, tt.component, "test-service", tt.multinodeDeployer)
+			if tt.expectNotModified {
+				g.Expect(tt.initialContainer.Args).To(gomega.Equal(priorArgs))

37-45: Leader probes should not be asserted as removed; initialize probes to verify behavior.

Leader probes remain active per design; the test should not mark expectProbesRemoved for leader. Also set non-nil probes to actually test removal for workers.

-			name:                "multinode leader prepends ray start --head",
+			name:                "multinode leader prepends ray start --head (probes unchanged)",
@@
-			expectProbesRemoved: true,
+			expectProbesRemoved: false,

Follow-up: Initialize Liveness/Readiness/Startup probes in worker cases to validate removal.


252-264: Add a test for non-divisible GPUs/world size (rounding semantics).

When GPUs_per_node is not divisible by TP×PP, dataParallelSizeLocal truncates. Add a case to document expected behavior (or enforce divisibility).

deploy/cloud/operator/internal/dynamo/backend_vllm.go (3)

31-36: Fix misleading comment (leader probes aren’t removed).

The code removes probes only for workers; update the comment to avoid confusion.

-		// Remove probes for multinode worker and leader
+		// Remove probes for multinode workers

90-99: Validate GPU/parallelism divisibility before DP injection.

dataParallelSizeLocal uses integer division; if GPUs_per_node is not divisible by TP×PP, local size truncates and can underutilize GPUs. Warn or fail fast.

-		dataParallelSizeLocal := getContainerGPUs(container) / getWorldSize(container)
+		containerGPUs := getContainerGPUs(container)
+		worldSize := getWorldSize(container)
+		if worldSize == 0 {
+			worldSize = 1
+		}
+		if containerGPUs%worldSize != 0 {
+			log.Log.WithName("vllm-backend").Info("GPUs_per_node not divisible by TP×PP; truncating data-parallel workers per node",
+				"gpusPerNode", containerGPUs, "worldSize", worldSize)
+		}
+		dataParallelSizeLocal := containerGPUs / worldSize

131-143: Consider supporting --flag=value style parsing.

getFlagValue only handles '--flag value'. Many CLIs accept '--flag=value'. Supporting both improves robustness.

 	for i, arg := range container.Args {
-		if arg == flag && (i+1 < len(container.Args)) {
+		if arg == flag && (i+1 < len(container.Args)) {
 			flagValue, err := strconv.ParseInt(container.Args[i+1], 10, 64)
 			if err != nil {
 				continue
 			}
 			return flagValue
+		}
+		// Support --flag=value form
+		if strings.HasPrefix(arg, flag+"=") {
+			val := strings.TrimPrefix(arg, flag+"=")
+			if v, err := strconv.ParseInt(val, 10, 64); err == nil {
+				return v
+			}
 		}
 	}
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cd2389b and fa5f9ee.

📒 Files selected for processing (10)
  • deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml (0 hunks)
  • deploy/cloud/helm/crds/templates/nvidia.com_dynamographdeployments.yaml (2 hunks)
  • deploy/cloud/operator/config/crd/bases/nvidia.com_dynamocomponentdeployments.yaml (0 hunks)
  • deploy/cloud/operator/config/crd/bases/nvidia.com_dynamographdeployments.yaml (2 hunks)
  • deploy/cloud/operator/docs/footer.md (2 hunks)
  • deploy/cloud/operator/internal/dynamo/backend_sglang.go (1 hunks)
  • deploy/cloud/operator/internal/dynamo/backend_vllm.go (2 hunks)
  • deploy/cloud/operator/internal/dynamo/backend_vllm_test.go (9 hunks)
  • deploy/cloud/operator/internal/dynamo/utils.go (1 hunks)
  • docs/kubernetes/multinode-deployment.md (1 hunks)
💤 Files with no reviewable changes (2)
  • deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml
  • deploy/cloud/operator/config/crd/bases/nvidia.com_dynamocomponentdeployments.yaml
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-07-25T22:34:11.384Z
Learnt from: nnshah1
PR: ai-dynamo/dynamo#2124
File: components/backends/vllm/deploy/disagg.yaml:54-60
Timestamp: 2025-07-25T22:34:11.384Z
Learning: In vLLM worker deployments, startup probes (with longer periods and higher failure thresholds like periodSeconds: 10, failureThreshold: 60) are used to handle the slow model loading startup phase, while liveness probes are intentionally kept aggressive (periodSeconds: 5, failureThreshold: 1) for quick failure detection once the worker is operational. This pattern separates startup concerns from operational health monitoring in GPU-heavy workloads.

Applied to files:

  • deploy/cloud/operator/docs/footer.md
🧬 Code graph analysis (3)
deploy/cloud/operator/internal/dynamo/backend_vllm.go (2)
deploy/cloud/operator/internal/dynamo/graph.go (4)
  • Role (559-559)
  • MultinodeDeployer (615-619)
  • RoleLeader (562-562)
  • RoleWorker (563-563)
deploy/cloud/operator/internal/consts/consts.go (1)
  • KubeResourceGPUNvidia (47-47)
deploy/cloud/operator/internal/dynamo/utils.go (1)
benchmarks/profiler/utils/config.py (1)
  • Container (49-54)
deploy/cloud/operator/internal/dynamo/backend_vllm_test.go (3)
deploy/cloud/operator/internal/dynamo/backend_vllm.go (1)
  • VLLMPort (15-15)
deploy/cloud/operator/internal/dynamo/graph.go (5)
  • Role (559-559)
  • MultinodeDeployer (615-619)
  • RoleMain (564-564)
  • RoleLeader (562-562)
  • RoleWorker (563-563)
deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go (1)
  • DynamoComponentDeploymentSharedSpec (48-106)
🪛 markdownlint-cli2 (0.18.1)
docs/kubernetes/multinode-deployment.md

191-191: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


203-203: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Validate PR title and add label
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (7)
deploy/cloud/helm/crds/templates/nvidia.com_dynamographdeployments.yaml (2)

69-70: Deployment-scoped env description reads correctly.
The revised wording matches the deployment-level semantics and clarifies override scope.


10386-10387: Concise services description LGTM.
The new phrasing is clear without altering schema semantics.

deploy/cloud/operator/config/crd/bases/nvidia.com_dynamographdeployments.yaml (2)

69-70: Documentation wording is consistent.

The updated description correctly reflects deployment-wide env scope, matching the rest of the spec.


10386-10387: Services description looks good.

Wording is clearer without altering behavior.

deploy/cloud/operator/docs/footer.md (2)

98-107: Mode selection conditions look correct (world_size vs GPUs).

Docs align with code paths (Ray when TP×PP > GPUs; DP when TP×PP×DP > GPUs). LGTM.


204-206: Confirm whether port 13445 needs explicit containerPort/service exposure.

If DP coordination requires inbound connections to the leader, ensure:

  • container.ports includes 13445 (optional for Pod IP access, required if exposed via a Service).
  • Any Service definitions include this port when needed.
deploy/cloud/operator/internal/dynamo/backend_sglang.go (1)

62-62: Good refactor: centralizing flag injection.

Using injectFlagsIntoContainerCommand removes duplication and keeps behavior consistent across backends.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
deploy/cloud/operator/internal/dynamo/utils.go (1)

31-61: Data-parallel injection still misses /usr/bin/env python launches

When the image uses an env-wrapper (command: ["/usr/bin/env", "python3"], args: ["-m", "vllm.entrypoints.api_server", …]), the first branch rejects it (isPythonCommand("/usr/bin/env") → false) and the fallback scans each arg in isolation. Because the interpreter and module tokens live in different slice entries, the regex never fires and we drop all distributed flags—which breaks the multinode DP flow for that common launch pattern, despite the earlier review note on the same gap.

Please treat env/python pairs as Python (or join command+args before attempting injection) so /usr/bin/env python* … receives the flags.

🧹 Nitpick comments (1)
deploy/cloud/operator/docs/footer.md (1)

97-106: Simplify redundant probe behavior documentation.

Both Ray-Based Mode and Data Parallel Mode specify identical probe behavior (worker nodes have all probes removed, leader nodes keep all probes active). Consider consolidating this into a single statement to reduce redundancy and improve readability.

Apply this diff to simplify the documentation:

 #### VLLM Backend
 
-The operator automatically selects between two deployment modes based on parallelism configuration:
-
-**Ray-Based Mode** (when `world_size > GPUs_per_node`):
-- **Worker nodes**: All probes (liveness, readiness, startup) are removed
-- **Leader nodes**: All probes remain active
-
-**Data Parallel Mode** (when `world_size × data_parallel_size > GPUs_per_node`):
+The operator automatically selects between Ray-Based Mode (when `world_size > GPUs_per_node`) and Data Parallel Mode (when `world_size × data_parallel_size > GPUs_per_node`) based on parallelism configuration. For both modes:
+
 - **Worker nodes**: All probes (liveness, readiness, startup) are removed
 - **Leader nodes**: All probes remain active
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cd2389b and b0c20d0.

📒 Files selected for processing (10)
  • deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml (0 hunks)
  • deploy/cloud/helm/crds/templates/nvidia.com_dynamographdeployments.yaml (2 hunks)
  • deploy/cloud/operator/config/crd/bases/nvidia.com_dynamocomponentdeployments.yaml (0 hunks)
  • deploy/cloud/operator/config/crd/bases/nvidia.com_dynamographdeployments.yaml (2 hunks)
  • deploy/cloud/operator/docs/footer.md (2 hunks)
  • deploy/cloud/operator/internal/dynamo/backend_sglang.go (1 hunks)
  • deploy/cloud/operator/internal/dynamo/backend_vllm.go (2 hunks)
  • deploy/cloud/operator/internal/dynamo/backend_vllm_test.go (9 hunks)
  • deploy/cloud/operator/internal/dynamo/utils.go (1 hunks)
  • docs/kubernetes/multinode-deployment.md (1 hunks)
💤 Files with no reviewable changes (2)
  • deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml
  • deploy/cloud/operator/config/crd/bases/nvidia.com_dynamocomponentdeployments.yaml
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-07-25T22:34:11.384Z
Learnt from: nnshah1
PR: ai-dynamo/dynamo#2124
File: components/backends/vllm/deploy/disagg.yaml:54-60
Timestamp: 2025-07-25T22:34:11.384Z
Learning: In vLLM worker deployments, startup probes (with longer periods and higher failure thresholds like periodSeconds: 10, failureThreshold: 60) are used to handle the slow model loading startup phase, while liveness probes are intentionally kept aggressive (periodSeconds: 5, failureThreshold: 1) for quick failure detection once the worker is operational. This pattern separates startup concerns from operational health monitoring in GPU-heavy workloads.

Applied to files:

  • deploy/cloud/operator/docs/footer.md
🧬 Code graph analysis (3)
deploy/cloud/operator/internal/dynamo/backend_vllm.go (2)
deploy/cloud/operator/internal/dynamo/graph.go (4)
  • Role (559-559)
  • MultinodeDeployer (615-619)
  • RoleLeader (562-562)
  • RoleWorker (563-563)
deploy/cloud/operator/internal/consts/consts.go (1)
  • KubeResourceGPUNvidia (47-47)
deploy/cloud/operator/internal/dynamo/backend_vllm_test.go (5)
deploy/cloud/operator/internal/dynamo/backend_vllm.go (1)
  • VLLMPort (15-15)
deploy/cloud/operator/internal/dynamo/graph.go (6)
  • Role (559-559)
  • MultinodeDeployer (615-619)
  • RoleMain (564-564)
  • Resources (82-87)
  • RoleLeader (562-562)
  • RoleWorker (563-563)
deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go (1)
  • DynamoComponentDeploymentSharedSpec (48-106)
deploy/cloud/operator/internal/dynamo/grove.go (1)
  • GroveMultinodeDeployer (23-25)
deploy/cloud/operator/internal/dynamo/lws.go (1)
  • LWSMultinodeDeployer (5-7)
deploy/cloud/operator/internal/dynamo/utils.go (1)
benchmarks/profiler/utils/config.py (1)
  • Container (49-54)
🪛 markdownlint-cli2 (0.18.1)
docs/kubernetes/multinode-deployment.md

191-191: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


203-203: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (5)
deploy/cloud/operator/config/crd/bases/nvidia.com_dynamographdeployments.yaml (2)

69-70: Env description clarification looks good

The updated wording cleanly scopes shared env vars versus service overrides. Thanks for tightening the docs.


10386-10387: Service description update is clear

The refreshed text plainly states the field’s role in the deployment spec. No concerns here.

deploy/cloud/operator/docs/footer.md (1)

204-205: Documentation aligns with PR objectives.

The addition of the Data Parallel RPC Port (13445) and clarification of the Ray Head Port usage correctly reflects the new multinode data-parallel support for VLLM.

deploy/cloud/operator/internal/dynamo/backend_sglang.go (2)

62-62: Refactoring improves maintainability.

Delegating flag injection to the shared injectFlagsIntoContainerCommand utility aligns with the PR objective of centralizing multinode shell interpolation logic and reduces code duplication.


19-27: Retain isPythonCommand; it’s still in use: calls appear in backend_trtllm.go (line 112), utils.go (line 31) and are exercised by backend_sglang_test.go.

Likely an incorrect or invalid review comment.

Comment on lines 175 to 184
backend.UpdateContainer(tt.initialContainer, tt.numberOfNodes, tt.role, &v1alpha1.DynamoComponentDeploymentSharedSpec{}, "test-service", tt.multinodeDeployer)

if !reflect.DeepEqual(tt.initialContainer.Args, tt.expectedArgs) {
t.Errorf("UpdateContainer() args = %v, want %v", tt.initialContainer.Args, tt.expectedArgs)
}

expectedCommand := tt.initialContainer.Command
if !reflect.DeepEqual(tt.initialContainer.Command, expectedCommand) {
t.Errorf("UpdateContainer() should preserve shell command, got: %v, want: %v", tt.initialContainer.Command, expectedCommand)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Shell-command equality check is a no-op

You snapshot expectedCommand after calling UpdateContainer, so it aliases the mutated slice; the subsequent comparison will never fail. Copy the slice before invoking the backend to actually assert command preservation.

-			backend.UpdateContainer(tt.initialContainer, tt.numberOfNodes, tt.role, &v1alpha1.DynamoComponentDeploymentSharedSpec{}, "test-service", tt.multinodeDeployer)
-
-			if !reflect.DeepEqual(tt.initialContainer.Args, tt.expectedArgs) {
+			originalCommand := append([]string{}, tt.initialContainer.Command...)
+
+			backend.UpdateContainer(tt.initialContainer, tt.numberOfNodes, tt.role, &v1alpha1.DynamoComponentDeploymentSharedSpec{}, "test-service", tt.multinodeDeployer)
+
+			if !reflect.DeepEqual(tt.initialContainer.Args, tt.expectedArgs) {
 				t.Errorf("UpdateContainer() args = %v, want %v", tt.initialContainer.Args, tt.expectedArgs)
 			}
 
-			expectedCommand := tt.initialContainer.Command
-			if !reflect.DeepEqual(tt.initialContainer.Command, expectedCommand) {
-				t.Errorf("UpdateContainer() should preserve shell command, got: %v, want: %v", tt.initialContainer.Command, expectedCommand)
+			if !reflect.DeepEqual(tt.initialContainer.Command, originalCommand) {
+				t.Errorf("UpdateContainer() should preserve shell command, got: %v, want: %v", tt.initialContainer.Command, originalCommand)
 			}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
backend.UpdateContainer(tt.initialContainer, tt.numberOfNodes, tt.role, &v1alpha1.DynamoComponentDeploymentSharedSpec{}, "test-service", tt.multinodeDeployer)
if !reflect.DeepEqual(tt.initialContainer.Args, tt.expectedArgs) {
t.Errorf("UpdateContainer() args = %v, want %v", tt.initialContainer.Args, tt.expectedArgs)
}
expectedCommand := tt.initialContainer.Command
if !reflect.DeepEqual(tt.initialContainer.Command, expectedCommand) {
t.Errorf("UpdateContainer() should preserve shell command, got: %v, want: %v", tt.initialContainer.Command, expectedCommand)
}
// Snapshot the original command before calling UpdateContainer
originalCommand := append([]string{}, tt.initialContainer.Command...)
backend.UpdateContainer(tt.initialContainer, tt.numberOfNodes, tt.role, &v1alpha1.DynamoComponentDeploymentSharedSpec{}, "test-service", tt.multinodeDeployer)
if !reflect.DeepEqual(tt.initialContainer.Args, tt.expectedArgs) {
t.Errorf("UpdateContainer() args = %v, want %v", tt.initialContainer.Args, tt.expectedArgs)
}
// Verify that the command slice was preserved
if !reflect.DeepEqual(tt.initialContainer.Command, originalCommand) {
t.Errorf("UpdateContainer() should preserve shell command, got: %v, want: %v", tt.initialContainer.Command, originalCommand)
}
🤖 Prompt for AI Agents
In deploy/cloud/operator/internal/dynamo/backend_vllm_test.go around lines 175
to 184, the test captures expectedCommand after calling UpdateContainer which
aliases the mutated slice, making the comparison a no-op; copy the initial
container.Command slice (e.g., make a new slice and copy contents) before
invoking backend.UpdateContainer, then call UpdateContainer and compare the
post-call container.Command to the copied expected slice to properly assert
command preservation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants