Skip to content

[DO NOT MERGE] Disable cancellation at vLLM backend#3722

Closed
biswapanda wants to merge 2 commits intomainfrom
disable-cancel-vllm
Closed

[DO NOT MERGE] Disable cancellation at vLLM backend#3722
biswapanda wants to merge 2 commits intomainfrom
disable-cancel-vllm

Conversation

@biswapanda
Copy link
Contributor

@biswapanda biswapanda commented Oct 18, 2025

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Summary by CodeRabbit

  • Bug Fixes
    • Improved cancellation handling in token generation to prevent outputs from being yielded after requests are cancelled.
    • Enhanced efficiency in how generation requests are processed and routed through the system.

@biswapanda biswapanda requested review from a team as code owners October 18, 2025 17:16
@copy-pr-bot
Copy link

copy-pr-bot bot commented Oct 18, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@biswapanda biswapanda marked this pull request as draft October 18, 2025 17:16
@biswapanda
Copy link
Contributor Author

/ok to test 8a3d34a

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 18, 2025

Walkthrough

The changes refactor cancellation handling in the vllm handlers. The _monitor_abort method no longer explicitly calls abort, and output-yielding operations now include guards to prevent emission after cancellation. Request passing to clients is simplified to use only the prefill request without context.

Changes

Cohort / File(s) Summary
Cancellation and yield guards
components/src/dynamo/vllm/handlers.py
Removed explicit abort call in _monitor_abort; added cancellation status checks before yielding in generate_tokens and PrefillWorkerHandler; updated abort log message to indicate cancellation.
Request passing simplification
components/src/dynamo/vllm/handlers.py
Changed generate method to pass only prefill_request to prefill_router_client.generate and prefill_worker_client.round_robin instead of including context.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

The changes are localized to a single file with straightforward guard logic additions and method signature adjustments. The pattern of adding cancellation checks is consistent throughout, and the removal of context passing is a direct simplification.

Poem

🐰 Hopping through cancellations with grace,
No yields shall race beyond their place—
Guards stand watch at each output gate,
While requests skip context's weight.
Simpler paths make code more bright,
CodeRabbit's edits set things right!

Pre-merge checks

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The PR description consists entirely of empty template sections with no substantive content. All required sections—Overview, Details, Where should the reviewer start?, and Related Issues—are present as headings but contain only template placeholder comments. The Related Issues section includes only the placeholder text "closes GitHub issue: #xxx" with no actual issue reference. This represents a completely unfilled description that fails to provide any meaningful information about the changes, rationale, or implementation details necessary for review. The author must fill out the PR description by providing: a clear overview of the changes and their purpose, detailed descriptions of what was modified and why, specific files to review closely, and any related GitHub issues using the action keywords (Closes, Fixes, Resolves, or Relates to). This information is essential for reviewers to understand the context and impact of the changes.
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title Check ✅ Passed The PR title "[DO NOT MERGE] Disable cancellation at vLLM backend" accurately reflects the primary changes made in the pull request. The changes focus on modifying cancellation behavior in the vLLM backend handlers—including removing abort calls, updating log messages, and adding guards to prevent yielding after cancellation. While the "[DO NOT MERGE]" prefix signals the PR is not ready for merging, it does not make the title inaccurate or vague. The title is concise, clear, and specific enough that a reviewer scanning history would understand the main purpose of the changeset.

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4b7a806 and 8a3d34a.

📒 Files selected for processing (1)
  • components/src/dynamo/vllm/handlers.py (4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
components/src/dynamo/vllm/handlers.py (3)
lib/bindings/python/rust/lib.rs (2)
  • generate (796-808)
  • round_robin (812-845)
lib/bindings/python/src/dynamo/_core.pyi (4)
  • generate (1210-1249)
  • round_robin (208-212)
  • is_stopped (302-309)
  • is_killed (311-318)
lib/bindings/python/rust/llm/kv.rs (2)
  • generate (983-1061)
  • anext (1237-1249)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (1)
components/src/dynamo/vllm/handlers.py (1)

1-285: The review comment's technical observations are validated; the requests for context and impact analysis are justified.

Based on verification of the git history and code diff:

  • The commit message title confirms "[DO NOT MERGE] Disable cancellation at vLLM backend"
  • The git diff confirms the changes systematically disable cancellation:
    • Removed await self.engine_client.abort(request_id) call
    • Removed context=context parameter propagation to prefill clients
    • Added conditional yield guards that suppress output on cancellation (but never tell the engine to abort)
  • Timeline shows this immediately follows commit 734d2f8 ("Enable cancellation during or before a stream is established feat: Enable cancellation during or before a stream is established #3635"), suggesting a regression or incompatibility discovered after that feature was enabled
  • Cancellation tests exist (test_request_n_decode_cancel, test_request_n_remote_prefill_cancel) but their current status is unclear
  • No GitHub issues, linked PRs, or rationale are documented in the code or commit message

Your review comment's request for rationale, resource impact analysis, testing verification, and clarity on whether this is temporary or permanent is appropriate and necessary before this can be merged. The developer should provide this context.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@pull-request-size pull-request-size bot added size/M and removed size/S labels Oct 21, 2025
@github-actions
Copy link
Contributor

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants