feat: trtllm conditional disaggregation #3640

shpgy-shpgy · 2025-10-15T08:26:15Z

Overview:

Add support for Conditional Disaggregation to TensorRT-LLM.
Route to PD worker or non-PD worker depending on input request length.

Where should the reviewer start?

components/src/dynamo/trtllm/request_handlers/handlers.py
lib/llm/src/kv_router/scheduler.rs

Summary by CodeRabbit

New Features
- Conditional disaggregation with a configurable short-prefill threshold so small requests can be handled locally to reduce latency.
- Engine runtime can store and use a selectable disaggregation mode.
- Optional ISL-based routing threshold to refine which workers contribute logits.
Bug Fixes
- Decode no longer errors when disaggregated parameters are missing; it logs and falls back to local prefill.

copy-pr-bot · 2025-10-15T08:26:19Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

github-actions · 2025-10-15T08:26:25Z

👋 Hi shpgy-shpgy! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

coderabbitai · 2025-10-15T08:26:46Z

Walkthrough

Adds engine-specific disaggregation_mode to runtime config, introduces environment-driven conditional disaggregation for short-prefill in DecodeHandler and falls back to local prefill on errors, changes handler_base to log instead of raising when disaggregated params are missing in Decode, and gates KV-router worker logit insertion by an optional ISL threshold.

Changes

Cohort / File(s)	Summary of changes
TensorRT-LLM runtime `components/src/dynamo/trtllm/main.py`	Sets engine-specific runtime_config entry `"disaggregation_mode"` by rendering `config.disaggregation_mode.value` as JSON and storing via `runtime_config.set_engine_specific`.
Request handler base `components/src/dynamo/trtllm/request_handlers/handler_base.py`	In Decode mode, replaced a raised `ValueError` for missing `disaggregated_params` with an informational log and continuation to local prefill handling.
Decode handler with conditional disaggregation `components/src/dynamo/trtllm/request_handlers/handlers.py`	Adds env-driven flags: `use_conditional_disaggregation` (from `DYNAMO_USE_CONDITIONAL_DISAGGREGATION`) and `short_prefill_threshold` (from `DYNAMO_SHORT_PREFILL_THRESHOLD_TOKENS` with fallback). Compute ISL token count and choose local short prefill when enabled and below threshold; otherwise use remote prefill. On remote prefill error, log and fall back to local handling. Adds `os` and `asyncio` imports. New attributes on `DecodeHandler`: `use_conditional_disaggregation: bool`, `short_prefill_threshold: int`.
KV Router scheduler `lib/llm/src/kv_router/scheduler.rs`	Adds optional ISL-threshold gating (controlled by `KV_ROUTER_USE_ISL_THRESHOLD` and `KV_ROUTER_ISL_THRESHOLD`) to decide per-worker logit insertion based on per-worker `disaggregation_mode` runtime data and ISL token counts; preserves max_logit update and subsequent softmax selection. No exported signature changes reported.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Client
  participant DecodeHandler
  participant RemotePrefill as Remote Prefill
  participant LocalPrefill as Local Prefill

  Client->>DecodeHandler: generate(request)
  DecodeHandler->>DecodeHandler: read env/config (conditional disaggregation, threshold)
  DecodeHandler->>DecodeHandler: compute ISL token count
  alt conditional enabled AND ISL <= threshold
    DecodeHandler->>LocalPrefill: prefill locally (short prefill path)
    LocalPrefill-->>DecodeHandler: local prefill result
    DecodeHandler-->>Client: continue decode with local prefill
  else
    DecodeHandler->>RemotePrefill: prefill remotely (single-response path)
    RemotePrefill-->>DecodeHandler: response (or error)
    alt remote returned error
      DecodeHandler->>LocalPrefill: fallback to local prefill
      LocalPrefill-->>DecodeHandler: local prefill result
      DecodeHandler-->>Client: continue decode with local prefill
    else
      DecodeHandler-->>Client: continue decode with remote prefill (propagate disaggregated_params if present)
    end
  end

sequenceDiagram
  autonumber
  participant Router as KV Router
  participant Worker[i] as Worker[i]
  Note over Router: If KV_ROUTER_USE_ISL_THRESHOLD=true<br/>use KV_ROUTER_ISL_THRESHOLD
  loop for each worker i
    Router->>Worker[i]: read runtime data (disaggregation_mode)
    Router->>Router: evaluate ISL tokens vs threshold and worker mode
    alt worker passes gating
      Router->>Router: insert worker logit
    else
      Router->>Router: skip worker logit
    end
  end
  Router->>Router: update max_logit, compute softmax, select

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

I twitch my whiskers at the config light,
Short hops local when the tokens are slight.
Routers tally logits under moon and sun,
Errors now shrugged — the fallback’s begun.
Thump-thump! Disagg set, the job’s neatly done. 🥕🐇

Pre-merge checks

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description Check	⚠️ Warning	The description provides an Overview and reviewer start sections but omits the required "Details" section describing the specific code changes and the "Related Issues" section linking relevant GitHub issues, making the PR incomplete against the repository template.	Please add a "Details" section that outlines the exact changes made in this PR and include a "Related Issues" section to reference any GitHub issues closed or addressed by the changes.
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title "feat: trtllm conditional disaggregation" clearly and concisely summarizes the main change of adding conditional disaggregation support for TensorRT-LLM and follows conventional commit style, making it easily understandable in the project history.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (3)

components/src/dynamo/trtllm/request_handlers/handlers.py (3)
4-7: Remove unused asyncio import.

The asyncio import on line 7 is not used in the visible code changes. Remove it to keep imports clean, unless it's used elsewhere in the file not shown in this diff.

Run this script to verify asyncio usage:
#!/bin/bash
# Check if asyncio is used in handlers.py
rg -n '\basyncio\.' components/src/dynamo/trtllm/request_handlers/handlers.py
263-271: Clarify the short prefill local handling path.

When isl_tokens <= threshold and conditional disaggregation is enabled, the code logs "Short prefill, handled locally" but doesn't explicitly skip the remote prefill call—it's implied by the else block. Consider adding an explicit early continuation or restructuring for clarity.

Consider this more explicit structure:
if isl_tokens <= threshold and use_conditional_disaggregation:
    # Short prefill, handled locally - skip remote prefill
    logging.info("Short prefill (isl_tokens=%d <= threshold=%d), handled locally", isl_tokens, threshold)
    # Continue to local generation below
else:
    # Long prefill, route to remote prefill worker
    async for res in self.remote_prefill(request, context):
        prefill_response = res
        response_count += 1
        if response_count > 1:
            raise ValueError("Prefill response should be generated only once.")
271-271: Use a custom exception class for single-response violations.

The static analysis tool flags this line for specifying a long message directly in the exception. Consider defining a custom exception class to encapsulate this validation logic.

Based on static analysis hints, consider this refactor:
class PrefillResponseError(Exception):
    """Raised when prefill returns multiple responses when only one is expected."""
    def __init__(self):
        super().__init__("Prefill response should be generated only once.")
Then use: raise PrefillResponseError()

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6a1391e and fca518c.

📒 Files selected for processing (4)

components/src/dynamo/trtllm/main.py (2 hunks)
components/src/dynamo/trtllm/request_handlers/handler_base.py (1 hunks)
components/src/dynamo/trtllm/request_handlers/handlers.py (4 hunks)
lib/llm/src/kv_router/scheduler.rs (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

components/src/dynamo/trtllm/request_handlers/handlers.py (1)

components/src/dynamo/trtllm/request_handlers/handler_base.py (3)

RequestHandlerConfig (59-76)

DisaggregationStrategy (53-55)

generate_locally (151-309)

🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/3640/merge) by shpgy-shpgy.

components/src/dynamo/trtllm/request_handlers/handlers.py

[error] 1-1: isort: files were modified by this hook during pre-commit

[error] 1-1: black: reformatted components/src/dynamo/trtllm/request_handlers/handlers.py

components/src/dynamo/trtllm/request_handlers/handler_base.py

[error] 1-1: black: reformatted components/src/dynamo/trtllm/request_handlers/handler_base.py

components/src/dynamo/trtllm/main.py

[error] 1-1: isort: files were modified by this hook during pre-commit

[error] 1-1: ruff: Found 1 error (1 fixed, 0 remaining) during pre-commit

🪛 Ruff (0.14.0)

components/src/dynamo/trtllm/request_handlers/handlers.py

271-271: Avoid specifying long messages outside the exception class

(TRY003)

components/src/dynamo/trtllm/main.py

10-10: Redefinition of unused json from line 5

Remove definition: json

(F811)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: clippy (.)
GitHub Check: Build and Test - dynamo

🔇 Additional comments (6)

components/src/dynamo/trtllm/main.py (1)

329-331: LGTM! Engine-specific disaggregation mode configuration.

The disaggregation_mode is correctly serialized as JSON and stored in the runtime configuration, enabling per-worker mode detection in the scheduler.

components/src/dynamo/trtllm/request_handlers/handler_base.py (1)

207-208: LGTM! Graceful degradation for missing disaggregated params.

The change from raising a ValueError to logging an informational message enables Decode workers to fall back to local prefill when disaggregated params are missing. This aligns with the conditional disaggregation feature, allowing flexible routing based on request characteristics.

components/src/dynamo/trtllm/request_handlers/handlers.py (3)

195-207: LGTM! Clear environment variable handling with fallbacks.

The environment variable reading logic is well-structured with proper error handling and fallback to config or default values. The warning message for invalid threshold values helps with debugging.

255-260: Verify prefill_tokens assignment and token_ids type coverage

No occurrences of setting prefill_tokens in the codebase—request.get("prefill_tokens") always falls back to 0.

isinstance(token_ids, (list, tuple)) excludes other sequence types (e.g. numpy.ndarray, torch.Tensor).
Confirm how upstream populates these fields and extend the logic to cover all expected input types.

280-287: Review silent fallback for remote prefill errors

The code now logs errors and always falls back to local prefill instead of propagating failures to clients, masking every error case. Confirm this aligns with intended UX and SLAs. Consider emitting a metric/counter for prefill failures and selectively surfacing critical errors (e.g. auth/authz) while silencing capacity or timeout issues.

lib/llm/src/kv_router/scheduler.rs (1)

542-542: max_logit update is unused
max_logit is never read after line 542; downstream logic uses worker_logits directly.

components/src/dynamo/trtllm/main.py

components/src/dynamo/trtllm/request_handlers/handlers.py

lib/llm/src/kv_router/scheduler.rs

Signed-off-by: shpgy-shpgy <[email protected]>

format format Signed-off-by: shpgy-shpgy <[email protected]> format Signed-off-by: shpgy-shpgy <[email protected]> format Signed-off-by: shpgy-shpgy <[email protected]> format format Signed-off-by: shpgy-shpgy <[email protected]> format Signed-off-by: shpgy-shpgy <[email protected]>

shpgy-shpgy requested review from a team as code owners October 15, 2025 08:26

pull-request-size bot added the size/M label Oct 15, 2025

github-actions bot added the external-contribution Pull request is from an external contributor label Oct 15, 2025

shpgy-shpgy force-pushed the shpgy/trtllm_conditional_disaggregation branch from 7db771a to 79eac2a Compare October 15, 2025 08:29

shpgy-shpgy changed the title ~~Shpgy/trtllm conditional disaggregation~~ feat: trtllm conditional disaggregation Oct 15, 2025

github-actions bot added the feat label Oct 15, 2025

shpgy-shpgy force-pushed the shpgy/trtllm_conditional_disaggregation branch from 79eac2a to fca518c Compare October 15, 2025 08:32

coderabbitai bot reviewed Oct 15, 2025

View reviewed changes

components/src/dynamo/trtllm/main.py Outdated Show resolved Hide resolved

components/src/dynamo/trtllm/request_handlers/handlers.py Outdated Show resolved Hide resolved

lib/llm/src/kv_router/scheduler.rs Outdated Show resolved Hide resolved

shpgy-shpgy force-pushed the shpgy/trtllm_conditional_disaggregation branch 4 times, most recently from 1c8d03a to 9382b3b Compare October 15, 2025 09:19

pull-request-size bot added size/L and removed size/M labels Oct 15, 2025

shpgy-shpgy force-pushed the shpgy/trtllm_conditional_disaggregation branch 4 times, most recently from 155d865 to 368c0c0 Compare October 15, 2025 12:06

shpgy-shpgy requested a review from a team as a code owner October 15, 2025 12:06

pull-request-size bot added size/XXL and removed size/L labels Oct 15, 2025

shpgy-shpgy and others added 4 commits October 15, 2025 20:11

rout by length

c74128d

Signed-off-by: shpgy-shpgy <[email protected]>

delete remote by length

dc546a5

Signed-off-by: shpgy-shpgy <[email protected]>

short prefill runs in decode engine.

918cd09

Signed-off-by: shpgy-shpgy <[email protected]>

separated pd directly by isl_threshold

411c7b6

Signed-off-by: shpgy-shpgy <[email protected]>

shpgy-shpgy added 2 commits October 15, 2025 20:11

short request prefill runs in decode engine.

18aa0dc

Signed-off-by: shpgy-shpgy <[email protected]>

delete debug log

069a9ee

Signed-off-by: shpgy-shpgy <[email protected]>

shpgy-shpgy force-pushed the shpgy/trtllm_conditional_disaggregation branch from 368c0c0 to e4fdd1a Compare October 15, 2025 12:11

pull-request-size bot added size/L and removed size/XXL labels Oct 15, 2025

shpgy-shpgy force-pushed the shpgy/trtllm_conditional_disaggregation branch from e4fdd1a to a46acac Compare October 15, 2025 12:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: trtllm conditional disaggregation #3640

feat: trtllm conditional disaggregation #3640

shpgy-shpgy commented Oct 15, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Oct 15, 2025

Uh oh!

github-actions bot commented Oct 15, 2025

Uh oh!

coderabbitai bot commented Oct 15, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: trtllm conditional disaggregation #3640

Are you sure you want to change the base?

feat: trtllm conditional disaggregation #3640

Conversation

shpgy-shpgy commented Oct 15, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Where should the reviewer start?

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Oct 15, 2025

Uh oh!

github-actions bot commented Oct 15, 2025

Uh oh!

coderabbitai bot commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

shpgy-shpgy commented Oct 15, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 15, 2025 •

edited

Loading