Skip to content

feat: Enable cancellation during or before a stream is established#3635

Merged
kthui merged 9 commits intomainfrom
jacky-ft-py-context-cancel-before-stream
Oct 16, 2025
Merged

feat: Enable cancellation during or before a stream is established#3635
kthui merged 9 commits intomainfrom
jacky-ft-py-context-cancel-before-stream

Conversation

@kthui
Copy link
Contributor

@kthui kthui commented Oct 15, 2025

Overview:

  • Python binding will cancel the outgoing request if cancellation happens before the stream is established.
  • Migration will abort creating new streams if the request is cancelled.

Details:

  • Python binding update to allow context to be linked before making the outgoing request.
  • New unit test checking the cancellation behavior with Python binding.
  • Migration logic enhancement to support aborting new stream construction if request is cancelled.
  • New unit test checking the Migration new stream construction abort logic.

Where should the reviewer start?

Start with the Python binding unit tests, and then the Python binding changes, and finish off with the Migration layer.

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

N/A

Summary by CodeRabbit

  • New Features
    • Unified request context handling across all request paths for more consistent behavior.
  • Bug Fixes
    • Cancellation is now reliably propagated and honored across all request types.
    • Requests no longer start if the context is already stopped or cancelled, returning a clear error.
    • Standardized instrumentation for improved observability.
  • Tests
    • Added scenarios verifying cancellation when a context is already cancelled.
    • Added scenarios verifying cancellation triggered before awaiting a request.

kthui added 5 commits October 14, 2025 21:01
…ding request

Signed-off-by: Jacky <18255193+kthui@users.noreply.github.com>
Signed-off-by: Jacky <18255193+kthui@users.noreply.github.com>
This reverts commit cc055ae.

Signed-off-by: Jacky <18255193+kthui@users.noreply.github.com>
Signed-off-by: Jacky <18255193+kthui@users.noreply.github.com>
…celled

Signed-off-by: Jacky <18255193+kthui@users.noreply.github.com>
@kthui kthui self-assigned this Oct 15, 2025
@github-actions github-actions bot added the feat label Oct 15, 2025
@kthui kthui marked this pull request as ready for review October 15, 2025 17:41
@kthui kthui requested review from a team as code owners October 15, 2025 17:41
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 15, 2025

Walkthrough

Refactors Python binding to centralize request context creation and span handling via a new helper, unifying downstream calls across routing paths. Adds cancellation-aware checks in RetryManager before starting streams. Introduces two async tests validating client-side context cancellation states and server-side stop behavior.

Changes

Cohort / File(s) Summary
Python binding: request context refactor
lib/bindings/python/rust/lib.rs
Adds create_request_context helper; replaces ad-hoc RsContext construction with request_ctx; standardizes span retrieval/application across round_robin/random/direct/static paths; imports AsyncEngineContextProvider; notes placeholder cancellation propagation.
Cancellation tests
lib/bindings/python/tests/cancellation/test_cancellation.py
Adds two async tests: context already cancelled; context cancelled before awaiting request. Both iterate stream expecting cancellation and assert server context stopped but not killed.
LLM RetryManager pre-stream check
lib/llm/src/migration.rs
In RetryManager::new_stream, aborts if context is stopped/killed with error "Context is stopped or killed"; adds test ensuring failure when context stopped before stream creation.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant PyClient as Python Client
  participant PyBind as Python Binding (Rust)
  participant Ctx as create_request_context
  participant Span as get_span_for_*()
  participant RM as RetryManager::new_stream
  participant Svc as Downstream Client

  User->>PyClient: generate(request, context)
  PyClient->>PyBind: call binding
  PyBind->>Ctx: build request_ctx (link/cancel-aware)
  Ctx-->>PyBind: request_ctx
  PyBind->>Span: compute span for context
  Span-->>PyBind: span
  PyBind->>RM: new_stream(request_ctx, span)

  alt Context stopped/killed
    RM-->>PyBind: Error("Context is stopped or killed")
    PyBind-->>PyClient: propagate cancellation error/ended stream
  else Context active
    RM->>Svc: start stream with request_ctx
    Svc-->>RM: tokens/events
    RM-->>PyBind: stream
    PyBind-->>PyClient: stream items
  end

  note over PyClient,PyBind: Tests also cancel context before/after request await and expect graceful stop (not killed).
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

A rabbit taps the context tree,
Snips stray threads so calls run free.
If stopped, we bow and end the stream—
No panic, just a gentle beam.
With spans aligned, we hop in sync,
Refactors done—carrots in the link! 🥕

Pre-merge checks

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed The pull request description includes all required sections from the template (Overview, Details, Where should the reviewer start, Related Issues) and all are filled with relevant information detailing the feature, implementation changes, test additions, review guidance, and acknowledges absence of related issues.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Title Check ✅ Passed The pull request title "feat: Enable cancellation during or before a stream is established" is fully aligned with the changeset. The PR implements cancellation handling across three areas: the Python binding introduces a create_request_context helper to manage cancellation before stream establishment, adds test cases verifying cancellation scenarios, and the migration layer adds checks to abort stream creation if the context is already stopped or killed. The title is concise, specific, and clearly communicates the primary feature being added without vagueness or generic phrasing.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (6)
lib/bindings/python/tests/cancellation/test_cancellation.py (2)

296-312: Use deterministic wait and pytest.fail instead of raising AssertionError

Replace sleep with awaiting the context signal and fail via pytest for clarity and lint compliance (TRY003).

Apply this diff:

-    # Give server a moment to process anything
-    await asyncio.sleep(0.2)
+    # Wait deterministically for cancellation to be observable
+    await context.async_killed_or_stopped()
 
-    async for _ in stream:
-        raise AssertionError("Request should be cancelled when generate begins")
+    async for _ in stream:
+        pytest.fail("Request should be cancelled when generate begins")

316-332: Same here: await cancellation signal and use pytest.fail

Mirror the deterministic wait and failure style.

Apply this diff:

-    # Give server a moment to process anything
-    await asyncio.sleep(0.2)
+    # Wait deterministically for cancellation to be observable
+    await context.async_killed_or_stopped()
 
-    async for _ in stream:
-        raise AssertionError("Request should be cancelled when generate begins")
+    async for _ in stream:
+        pytest.fail("Request should be cancelled when generate begins")
lib/bindings/python/rust/lib.rs (4)

819-838: Instrument the None-context branch as well for consistent tracing

Currently only Some(context) is instrumented. Instrument the None branch using request_ctx ID to “always emit spans.”

Apply this diff:

-            let stream = match context {
+            let stream = match context {
                 Some(context) => {
                     // Always instrument with appropriate span (none if no trace context)
                     let span = get_span_for_context(&context, "round_robin");
                     client
                         .round_robin(request_ctx)
                         .instrument(span)
                         .await
                         .map_err(to_pyerr)?
                 }
-                _ => client.round_robin(request_ctx).await.map_err(to_pyerr)?,
+                _ => {
+                    let req_id = request_ctx.context().id().to_string();
+                    let span = create_client_span("round_robin", &req_id, None);
+                    client
+                        .round_robin(request_ctx)
+                        .instrument(span)
+                        .await
+                        .map_err(to_pyerr)?
+                }
             };

856-874: Apply same tracing to random() when context is None

Parity with round_robin().

Apply this diff:

-                _ => client.random(request_ctx).await.map_err(to_pyerr)?,
+                _ => {
+                    let req_id = request_ctx.context().id().to_string();
+                    let span = create_client_span("random", &req_id, None);
+                    client
+                        .random(request_ctx)
+                        .instrument(span)
+                        .await
+                        .map_err(to_pyerr)?
+                }

894-916: Instrument direct() None-context branch and include instance_id

Add a span for the direct path even without a Context.

Apply this diff:

-                _ => client
-                    .direct(request_ctx, instance_id)
-                    .await
-                    .map_err(to_pyerr)?,
+                _ => {
+                    let req_id = request_ctx.context().id().to_string();
+                    let span = info_span!(
+                        "client_request",
+                        operation = "direct",
+                        request_id = req_id.as_str(),
+                        instance_id = instance_id
+                    );
+                    client
+                        .direct(request_ctx, instance_id)
+                        .instrument(span)
+                        .await
+                        .map_err(to_pyerr)?
+                }

937-955: Instrument static() None-context branch

Maintain consistent tracing across routing modes.

Apply this diff:

-                _ => client.r#static(request_ctx).await.map_err(to_pyerr)?,
+                _ => {
+                    let req_id = request_ctx.context().id().to_string();
+                    let span = create_client_span("static", &req_id, None);
+                    client
+                        .r#static(request_ctx)
+                        .instrument(span)
+                        .await
+                        .map_err(to_pyerr)?
+                }
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6a1391e and df95c74.

📒 Files selected for processing (3)
  • lib/bindings/python/rust/lib.rs (10 hunks)
  • lib/bindings/python/tests/cancellation/test_cancellation.py (1 hunks)
  • lib/llm/src/migration.rs (2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-27T17:56:14.690Z
Learnt from: kthui
PR: ai-dynamo/dynamo#2500
File: lib/llm/src/migration.rs:58-77
Timestamp: 2025-08-27T17:56:14.690Z
Learning: In lib/llm/src/migration.rs, the cancellation visibility in the Migration operator is intentionally one-way - it checks engine_ctx.is_stopped()/is_killed() to stop pulling from streams but doesn't link newly created streams as child contexts to the parent. This is a conscious architectural decision with plans for future enhancement.

Applied to files:

  • lib/llm/src/migration.rs
🧬 Code graph analysis (2)
lib/bindings/python/tests/cancellation/test_cancellation.py (1)
lib/bindings/python/src/dynamo/_core.pyi (1)
  • Context (287-372)
lib/bindings/python/rust/lib.rs (3)
lib/runtime/src/pipeline/context.rs (3)
  • context (206-208)
  • context (290-292)
  • with_id (49-56)
lib/runtime/src/engine.rs (6)
  • context (163-163)
  • context (239-241)
  • context (256-258)
  • context (262-264)
  • context (437-440)
  • context (452-454)
lib/bindings/python/src/dynamo/_core.pyi (1)
  • Context (287-372)
🪛 Ruff (0.14.0)
lib/bindings/python/tests/cancellation/test_cancellation.py

307-307: Avoid specifying long messages outside the exception class

(TRY003)


328-328: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: vllm (arm64)
🔇 Additional comments (2)
lib/llm/src/migration.rs (1)

722-753: Good coverage for pre‑stream cancellation

Test clearly validates early abort when context is stopped. Consider adding a killed-variant test if feasible to ensure parity.

lib/bindings/python/rust/lib.rs (1)

126-145: Helper centralizes request context + cancellation propagation — LGTM

Creates child context, links, and propagates stop if already cancelled. Matches new tests’ expectations.

Signed-off-by: Jacky <18255193+kthui@users.noreply.github.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
Signed-off-by: Jacky <18255193+kthui@users.noreply.github.com>
@kthui kthui changed the title feat: Enable cancellation during and before a stream is established feat: Enable cancellation during or before a stream is established Oct 15, 2025
Signed-off-by: Jacky <18255193+kthui@users.noreply.github.com>
Signed-off-by: Jacky <18255193+kthui@users.noreply.github.com>
@kthui kthui merged commit 734d2f8 into main Oct 16, 2025
34 of 37 checks passed
@kthui kthui deleted the jacky-ft-py-context-cancel-before-stream branch October 16, 2025 02:22
athreesh pushed a commit that referenced this pull request Oct 16, 2025
…3635)

Signed-off-by: Jacky <18255193+kthui@users.noreply.github.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
ziqifan617 pushed a commit that referenced this pull request Oct 20, 2025
…3635)

Signed-off-by: Jacky <18255193+kthui@users.noreply.github.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants