Skip to content

Conversation

shpgy-shpgy
Copy link

@shpgy-shpgy shpgy-shpgy commented Oct 15, 2025

Overview:

Add support for Conditional Disaggregation to TensorRT-LLM.
Route to PD worker or non-PD worker depending on input request length.

Where should the reviewer start?

components/src/dynamo/trtllm/request_handlers/handlers.py
lib/llm/src/kv_router/scheduler.rs

Summary by CodeRabbit

  • New Features

    • Conditional disaggregation with a configurable short-prefill threshold so small requests can be handled locally to reduce latency.
    • Engine runtime can store and use a selectable disaggregation mode.
    • Optional ISL-based routing threshold to refine which workers contribute logits.
  • Bug Fixes

    • Decode no longer errors when disaggregated parameters are missing; it logs and falls back to local prefill.

@shpgy-shpgy shpgy-shpgy requested review from a team as code owners October 15, 2025 08:26
Copy link

copy-pr-bot bot commented Oct 15, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Copy link

👋 Hi shpgy-shpgy! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

@github-actions github-actions bot added the external-contribution Pull request is from an external contributor label Oct 15, 2025
Copy link
Contributor

coderabbitai bot commented Oct 15, 2025

Walkthrough

Adds engine-specific disaggregation_mode to runtime config, introduces environment-driven conditional disaggregation for short-prefill in DecodeHandler and falls back to local prefill on errors, changes handler_base to log instead of raising when disaggregated params are missing in Decode, and gates KV-router worker logit insertion by an optional ISL threshold.

Changes

Cohort / File(s) Summary of changes
TensorRT-LLM runtime
components/src/dynamo/trtllm/main.py
Sets engine-specific runtime_config entry "disaggregation_mode" by rendering config.disaggregation_mode.value as JSON and storing via runtime_config.set_engine_specific.
Request handler base
components/src/dynamo/trtllm/request_handlers/handler_base.py
In Decode mode, replaced a raised ValueError for missing disaggregated_params with an informational log and continuation to local prefill handling.
Decode handler with conditional disaggregation
components/src/dynamo/trtllm/request_handlers/handlers.py
Adds env-driven flags: use_conditional_disaggregation (from DYNAMO_USE_CONDITIONAL_DISAGGREGATION) and short_prefill_threshold (from DYNAMO_SHORT_PREFILL_THRESHOLD_TOKENS with fallback). Compute ISL token count and choose local short prefill when enabled and below threshold; otherwise use remote prefill. On remote prefill error, log and fall back to local handling. Adds os and asyncio imports. New attributes on DecodeHandler: use_conditional_disaggregation: bool, short_prefill_threshold: int.
KV Router scheduler
lib/llm/src/kv_router/scheduler.rs
Adds optional ISL-threshold gating (controlled by KV_ROUTER_USE_ISL_THRESHOLD and KV_ROUTER_ISL_THRESHOLD) to decide per-worker logit insertion based on per-worker disaggregation_mode runtime data and ISL token counts; preserves max_logit update and subsequent softmax selection. No exported signature changes reported.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Client
  participant DecodeHandler
  participant RemotePrefill as Remote Prefill
  participant LocalPrefill as Local Prefill

  Client->>DecodeHandler: generate(request)
  DecodeHandler->>DecodeHandler: read env/config (conditional disaggregation, threshold)
  DecodeHandler->>DecodeHandler: compute ISL token count
  alt conditional enabled AND ISL <= threshold
    DecodeHandler->>LocalPrefill: prefill locally (short prefill path)
    LocalPrefill-->>DecodeHandler: local prefill result
    DecodeHandler-->>Client: continue decode with local prefill
  else
    DecodeHandler->>RemotePrefill: prefill remotely (single-response path)
    RemotePrefill-->>DecodeHandler: response (or error)
    alt remote returned error
      DecodeHandler->>LocalPrefill: fallback to local prefill
      LocalPrefill-->>DecodeHandler: local prefill result
      DecodeHandler-->>Client: continue decode with local prefill
    else
      DecodeHandler-->>Client: continue decode with remote prefill (propagate disaggregated_params if present)
    end
  end
Loading
sequenceDiagram
  autonumber
  participant Router as KV Router
  participant Worker[i] as Worker[i]
  Note over Router: If KV_ROUTER_USE_ISL_THRESHOLD=true<br/>use KV_ROUTER_ISL_THRESHOLD
  loop for each worker i
    Router->>Worker[i]: read runtime data (disaggregation_mode)
    Router->>Router: evaluate ISL tokens vs threshold and worker mode
    alt worker passes gating
      Router->>Router: insert worker logit
    else
      Router->>Router: skip worker logit
    end
  end
  Router->>Router: update max_logit, compute softmax, select
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

I twitch my whiskers at the config light,
Short hops local when the tokens are slight.
Routers tally logits under moon and sun,
Errors now shrugged — the fallback’s begun.
Thump-thump! Disagg set, the job’s neatly done. 🥕🐇

Pre-merge checks

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The description provides an Overview and reviewer start sections but omits the required "Details" section describing the specific code changes and the "Related Issues" section linking relevant GitHub issues, making the PR incomplete against the repository template. Please add a "Details" section that outlines the exact changes made in this PR and include a "Related Issues" section to reference any GitHub issues closed or addressed by the changes.
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title Check ✅ Passed The title "feat: trtllm conditional disaggregation" clearly and concisely summarizes the main change of adding conditional disaggregation support for TensorRT-LLM and follows conventional commit style, making it easily understandable in the project history.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@shpgy-shpgy shpgy-shpgy force-pushed the shpgy/trtllm_conditional_disaggregation branch from 7db771a to 79eac2a Compare October 15, 2025 08:29
@shpgy-shpgy shpgy-shpgy changed the title Shpgy/trtllm conditional disaggregation feat: trtllm conditional disaggregation Oct 15, 2025
@github-actions github-actions bot added the feat label Oct 15, 2025
@shpgy-shpgy shpgy-shpgy force-pushed the shpgy/trtllm_conditional_disaggregation branch from 79eac2a to fca518c Compare October 15, 2025 08:32
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (3)
components/src/dynamo/trtllm/request_handlers/handlers.py (3)

4-7: Remove unused asyncio import.

The asyncio import on line 7 is not used in the visible code changes. Remove it to keep imports clean, unless it's used elsewhere in the file not shown in this diff.

Run this script to verify asyncio usage:

#!/bin/bash
# Check if asyncio is used in handlers.py
rg -n '\basyncio\.' components/src/dynamo/trtllm/request_handlers/handlers.py

263-271: Clarify the short prefill local handling path.

When isl_tokens <= threshold and conditional disaggregation is enabled, the code logs "Short prefill, handled locally" but doesn't explicitly skip the remote prefill call—it's implied by the else block. Consider adding an explicit early continuation or restructuring for clarity.

Consider this more explicit structure:

if isl_tokens <= threshold and use_conditional_disaggregation:
    # Short prefill, handled locally - skip remote prefill
    logging.info("Short prefill (isl_tokens=%d <= threshold=%d), handled locally", isl_tokens, threshold)
    # Continue to local generation below
else:
    # Long prefill, route to remote prefill worker
    async for res in self.remote_prefill(request, context):
        prefill_response = res
        response_count += 1
        if response_count > 1:
            raise ValueError("Prefill response should be generated only once.")

271-271: Use a custom exception class for single-response violations.

The static analysis tool flags this line for specifying a long message directly in the exception. Consider defining a custom exception class to encapsulate this validation logic.

Based on static analysis hints, consider this refactor:

class PrefillResponseError(Exception):
    """Raised when prefill returns multiple responses when only one is expected."""
    def __init__(self):
        super().__init__("Prefill response should be generated only once.")

Then use: raise PrefillResponseError()

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6a1391e and fca518c.

📒 Files selected for processing (4)
  • components/src/dynamo/trtllm/main.py (2 hunks)
  • components/src/dynamo/trtllm/request_handlers/handler_base.py (1 hunks)
  • components/src/dynamo/trtllm/request_handlers/handlers.py (4 hunks)
  • lib/llm/src/kv_router/scheduler.rs (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
components/src/dynamo/trtllm/request_handlers/handlers.py (1)
components/src/dynamo/trtllm/request_handlers/handler_base.py (3)
  • RequestHandlerConfig (59-76)
  • DisaggregationStrategy (53-55)
  • generate_locally (151-309)
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/3640/merge) by shpgy-shpgy.
components/src/dynamo/trtllm/request_handlers/handlers.py

[error] 1-1: isort: files were modified by this hook during pre-commit


[error] 1-1: black: reformatted components/src/dynamo/trtllm/request_handlers/handlers.py

components/src/dynamo/trtllm/request_handlers/handler_base.py

[error] 1-1: black: reformatted components/src/dynamo/trtllm/request_handlers/handler_base.py

components/src/dynamo/trtllm/main.py

[error] 1-1: isort: files were modified by this hook during pre-commit


[error] 1-1: ruff: Found 1 error (1 fixed, 0 remaining) during pre-commit

🪛 Ruff (0.14.0)
components/src/dynamo/trtllm/request_handlers/handlers.py

271-271: Avoid specifying long messages outside the exception class

(TRY003)

components/src/dynamo/trtllm/main.py

10-10: Redefinition of unused json from line 5

Remove definition: json

(F811)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: clippy (.)
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (6)
components/src/dynamo/trtllm/main.py (1)

329-331: LGTM! Engine-specific disaggregation mode configuration.

The disaggregation_mode is correctly serialized as JSON and stored in the runtime configuration, enabling per-worker mode detection in the scheduler.

components/src/dynamo/trtllm/request_handlers/handler_base.py (1)

207-208: LGTM! Graceful degradation for missing disaggregated params.

The change from raising a ValueError to logging an informational message enables Decode workers to fall back to local prefill when disaggregated params are missing. This aligns with the conditional disaggregation feature, allowing flexible routing based on request characteristics.

components/src/dynamo/trtllm/request_handlers/handlers.py (3)

195-207: LGTM! Clear environment variable handling with fallbacks.

The environment variable reading logic is well-structured with proper error handling and fallback to config or default values. The warning message for invalid threshold values helps with debugging.


255-260: Verify prefill_tokens assignment and token_ids type coverage

  • No occurrences of setting prefill_tokens in the codebase—request.get("prefill_tokens") always falls back to 0.
  • isinstance(token_ids, (list, tuple)) excludes other sequence types (e.g. numpy.ndarray, torch.Tensor).
    Confirm how upstream populates these fields and extend the logic to cover all expected input types.

280-287: Review silent fallback for remote prefill errors

The code now logs errors and always falls back to local prefill instead of propagating failures to clients, masking every error case. Confirm this aligns with intended UX and SLAs. Consider emitting a metric/counter for prefill failures and selectively surfacing critical errors (e.g. auth/authz) while silencing capacity or timeout issues.

lib/llm/src/kv_router/scheduler.rs (1)

542-542: max_logit update is unused
max_logit is never read after line 542; downstream logic uses worker_logits directly.

@shpgy-shpgy shpgy-shpgy force-pushed the shpgy/trtllm_conditional_disaggregation branch 4 times, most recently from 1c8d03a to 9382b3b Compare October 15, 2025 09:19
@pull-request-size pull-request-size bot added size/L and removed size/M labels Oct 15, 2025
@shpgy-shpgy shpgy-shpgy force-pushed the shpgy/trtllm_conditional_disaggregation branch 4 times, most recently from 155d865 to 368c0c0 Compare October 15, 2025 12:06
@shpgy-shpgy shpgy-shpgy requested a review from a team as a code owner October 15, 2025 12:06
shpgy-shpgy and others added 4 commits October 15, 2025 20:11
shpgy-shpgy added 2 commits October 15, 2025 20:11
@shpgy-shpgy shpgy-shpgy force-pushed the shpgy/trtllm_conditional_disaggregation branch from 368c0c0 to e4fdd1a Compare October 15, 2025 12:11
format

format

Signed-off-by: shpgy-shpgy <[email protected]>

format

Signed-off-by: shpgy-shpgy <[email protected]>

format

Signed-off-by: shpgy-shpgy <[email protected]>

format

format

Signed-off-by: shpgy-shpgy <[email protected]>

format

Signed-off-by: shpgy-shpgy <[email protected]>
@shpgy-shpgy shpgy-shpgy force-pushed the shpgy/trtllm_conditional_disaggregation branch from e4fdd1a to a46acac Compare October 15, 2025 12:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external-contribution Pull request is from an external contributor feat size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant