feat: Enable cancellation during or before a stream is established#3635
feat: Enable cancellation during or before a stream is established#3635
Conversation
…ding request Signed-off-by: Jacky <18255193+kthui@users.noreply.github.com>
Signed-off-by: Jacky <18255193+kthui@users.noreply.github.com>
This reverts commit cc055ae. Signed-off-by: Jacky <18255193+kthui@users.noreply.github.com>
Signed-off-by: Jacky <18255193+kthui@users.noreply.github.com>
…celled Signed-off-by: Jacky <18255193+kthui@users.noreply.github.com>
WalkthroughRefactors Python binding to centralize request context creation and span handling via a new helper, unifying downstream calls across routing paths. Adds cancellation-aware checks in RetryManager before starting streams. Introduces two async tests validating client-side context cancellation states and server-side stop behavior. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant PyClient as Python Client
participant PyBind as Python Binding (Rust)
participant Ctx as create_request_context
participant Span as get_span_for_*()
participant RM as RetryManager::new_stream
participant Svc as Downstream Client
User->>PyClient: generate(request, context)
PyClient->>PyBind: call binding
PyBind->>Ctx: build request_ctx (link/cancel-aware)
Ctx-->>PyBind: request_ctx
PyBind->>Span: compute span for context
Span-->>PyBind: span
PyBind->>RM: new_stream(request_ctx, span)
alt Context stopped/killed
RM-->>PyBind: Error("Context is stopped or killed")
PyBind-->>PyClient: propagate cancellation error/ended stream
else Context active
RM->>Svc: start stream with request_ctx
Svc-->>RM: tokens/events
RM-->>PyBind: stream
PyBind-->>PyClient: stream items
end
note over PyClient,PyBind: Tests also cancel context before/after request await and expect graceful stop (not killed).
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
Pre-merge checks✅ Passed checks (3 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (6)
lib/bindings/python/tests/cancellation/test_cancellation.py (2)
296-312: Use deterministic wait and pytest.fail instead of raising AssertionErrorReplace sleep with awaiting the context signal and fail via pytest for clarity and lint compliance (TRY003).
Apply this diff:
- # Give server a moment to process anything - await asyncio.sleep(0.2) + # Wait deterministically for cancellation to be observable + await context.async_killed_or_stopped() - async for _ in stream: - raise AssertionError("Request should be cancelled when generate begins") + async for _ in stream: + pytest.fail("Request should be cancelled when generate begins")
316-332: Same here: await cancellation signal and use pytest.failMirror the deterministic wait and failure style.
Apply this diff:
- # Give server a moment to process anything - await asyncio.sleep(0.2) + # Wait deterministically for cancellation to be observable + await context.async_killed_or_stopped() - async for _ in stream: - raise AssertionError("Request should be cancelled when generate begins") + async for _ in stream: + pytest.fail("Request should be cancelled when generate begins")lib/bindings/python/rust/lib.rs (4)
819-838: Instrument the None-context branch as well for consistent tracingCurrently only Some(context) is instrumented. Instrument the None branch using request_ctx ID to “always emit spans.”
Apply this diff:
- let stream = match context { + let stream = match context { Some(context) => { // Always instrument with appropriate span (none if no trace context) let span = get_span_for_context(&context, "round_robin"); client .round_robin(request_ctx) .instrument(span) .await .map_err(to_pyerr)? } - _ => client.round_robin(request_ctx).await.map_err(to_pyerr)?, + _ => { + let req_id = request_ctx.context().id().to_string(); + let span = create_client_span("round_robin", &req_id, None); + client + .round_robin(request_ctx) + .instrument(span) + .await + .map_err(to_pyerr)? + } };
856-874: Apply same tracing to random() when context is NoneParity with round_robin().
Apply this diff:
- _ => client.random(request_ctx).await.map_err(to_pyerr)?, + _ => { + let req_id = request_ctx.context().id().to_string(); + let span = create_client_span("random", &req_id, None); + client + .random(request_ctx) + .instrument(span) + .await + .map_err(to_pyerr)? + }
894-916: Instrument direct() None-context branch and include instance_idAdd a span for the direct path even without a Context.
Apply this diff:
- _ => client - .direct(request_ctx, instance_id) - .await - .map_err(to_pyerr)?, + _ => { + let req_id = request_ctx.context().id().to_string(); + let span = info_span!( + "client_request", + operation = "direct", + request_id = req_id.as_str(), + instance_id = instance_id + ); + client + .direct(request_ctx, instance_id) + .instrument(span) + .await + .map_err(to_pyerr)? + }
937-955: Instrument static() None-context branchMaintain consistent tracing across routing modes.
Apply this diff:
- _ => client.r#static(request_ctx).await.map_err(to_pyerr)?, + _ => { + let req_id = request_ctx.context().id().to_string(); + let span = create_client_span("static", &req_id, None); + client + .r#static(request_ctx) + .instrument(span) + .await + .map_err(to_pyerr)? + }
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
lib/bindings/python/rust/lib.rs(10 hunks)lib/bindings/python/tests/cancellation/test_cancellation.py(1 hunks)lib/llm/src/migration.rs(2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-27T17:56:14.690Z
Learnt from: kthui
PR: ai-dynamo/dynamo#2500
File: lib/llm/src/migration.rs:58-77
Timestamp: 2025-08-27T17:56:14.690Z
Learning: In lib/llm/src/migration.rs, the cancellation visibility in the Migration operator is intentionally one-way - it checks engine_ctx.is_stopped()/is_killed() to stop pulling from streams but doesn't link newly created streams as child contexts to the parent. This is a conscious architectural decision with plans for future enhancement.
Applied to files:
lib/llm/src/migration.rs
🧬 Code graph analysis (2)
lib/bindings/python/tests/cancellation/test_cancellation.py (1)
lib/bindings/python/src/dynamo/_core.pyi (1)
Context(287-372)
lib/bindings/python/rust/lib.rs (3)
lib/runtime/src/pipeline/context.rs (3)
context(206-208)context(290-292)with_id(49-56)lib/runtime/src/engine.rs (6)
context(163-163)context(239-241)context(256-258)context(262-264)context(437-440)context(452-454)lib/bindings/python/src/dynamo/_core.pyi (1)
Context(287-372)
🪛 Ruff (0.14.0)
lib/bindings/python/tests/cancellation/test_cancellation.py
307-307: Avoid specifying long messages outside the exception class
(TRY003)
328-328: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: vllm (arm64)
🔇 Additional comments (2)
lib/llm/src/migration.rs (1)
722-753: Good coverage for pre‑stream cancellationTest clearly validates early abort when context is stopped. Consider adding a killed-variant test if feasible to ensure parity.
lib/bindings/python/rust/lib.rs (1)
126-145: Helper centralizes request context + cancellation propagation — LGTMCreates child context, links, and propagates stop if already cancelled. Matches new tests’ expectations.
Signed-off-by: Jacky <18255193+kthui@users.noreply.github.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com> Signed-off-by: Jacky <18255193+kthui@users.noreply.github.com>
Signed-off-by: Jacky <18255193+kthui@users.noreply.github.com>
Signed-off-by: Jacky <18255193+kthui@users.noreply.github.com>
…3635) Signed-off-by: Jacky <18255193+kthui@users.noreply.github.com> Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
…3635) Signed-off-by: Jacky <18255193+kthui@users.noreply.github.com> Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
Overview:
Details:
Where should the reviewer start?
Start with the Python binding unit tests, and then the Python binding changes, and finish off with the Migration layer.
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
N/A
Summary by CodeRabbit