Skip to content

Fail in-flight invocations when worker channel restarts #11159

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 4 commits into
base: dev
Choose a base branch
from

Conversation

satvu
Copy link
Member

@satvu satvu commented Jul 2, 2025

Issue describing the changes in this PR

resolves #10936

Pull request checklist

IMPORTANT: Currently, changes must be backported to the in-proc branch to be included in Core Tools and non-Flex deployments.

  • Backporting to the in-proc branch is not required
    • Otherwise: Link to backporting PR
  • My changes do not require documentation changes
    • Otherwise: Documentation issue linked to PR
  • My changes should not be added to the release notes for the next release
    • Otherwise: I've added my notes to release_notes.md
  • My changes do not need to be backported to a previous version
    • Otherwise: Backport tracked by issue/PR #issue_or_pr
  • My changes do not require diagnostic events changes
    • Otherwise: I have added/updated all related diagnostic events and their documentation (Documentation issue linked to PR)
  • I have added all required tests (Unit tests, E2E tests)

Additional information

Additional PR information

@satvu satvu changed the title Worker channel crash Fail in-flight invocations when worker channel restarts Jul 2, 2025
{
public override async ValueTask TransformRequestAsync(HttpContext httpContext, HttpRequestMessage proxyRequest, string destinationPrefix, CancellationToken cancellationToken)
{
await base.TransformRequestAsync(httpContext, proxyRequest, destinationPrefix, cancellationToken);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed this offline, but I'm concerned this will not maintain the default behavior we previously had.

Your best course of action is here would likely be to call HttpTransformer.Default.TransformRequestAsync instead, then add our custom logic on top.

{
public override async ValueTask TransformRequestAsync(HttpContext httpContext, HttpRequestMessage proxyRequest, string destinationPrefix, CancellationToken cancellationToken)
{
var defaultTransformer = HttpTransformer.Default;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove this local, use HttpTransformer.Default directly.

@fabiocav fabiocav requested a review from Copilot July 2, 2025 23:35
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR ensures that when a worker channel restarts, any in-flight invocations are explicitly failed, and it wires through a new HTTP proxy context to halt retries when a shutdown occurs.

  • Replace TryFailExecutions with a new Shutdown method on RPC worker channels to uniformly fail in-flight invocations.
  • Introduce ScriptInvocationRequestTransformer, update proxy services/handlers to carry ScriptInvocationContext, and respect WorkerShutdownException to prevent retries.
  • Add WorkerShutdownException type and a new HttpProxyScriptInvocationContext constant for correlating proxy requests.

Reviewed Changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/WebJobs.Script/Workers/Rpc/WebHostRpcWorkerChannelManager.cs Swap TryFailExecutionsShutdown when disposing channels.
src/WebJobs.Script/Workers/Rpc/JobHostRpcWorkerChannelManager.cs Same replacement in JobHost RPC channel manager.
src/WebJobs.Script/Workers/Rpc/IRpcWorkerChannel.cs Update interface: remove TryFailExecutions, add Shutdown.
src/WebJobs.Script/ScriptConstants.cs Add HttpProxyScriptInvocationContext key.
src/WebJobs.Script/Http/ScriptInvocationRequestTransformer.cs New transformer to copy invocation context into proxy request.
src/WebJobs.Script/Http/RetryProxyHandler.cs Respect ScriptInvocationContext.ResultSource faults and stop retry on WorkerShutdownException.
src/WebJobs.Script/Http/DefaultHttpProxyService.cs Store invocation context in HttpContext.Items and use new transformer.
src/WebJobs.Script/Exceptions/WorkerShutdownException.cs Introduce WorkerShutdownException to signal channel shutdowns.
src/WebJobs.Script.Grpc/Channel/GrpcWorkerChannel.cs Implement Shutdown to wrap pending invocations in WorkerShutdownException.
Comments suppressed due to low confidence (4)

src/WebJobs.Script/Http/ScriptInvocationRequestTransformer.cs:20

  • Add unit tests for ScriptInvocationRequestTransformer.TransformRequestAsync to verify the HttpProxyScriptInvocationContext is correctly copied into the proxy request Options.
        await defaultTransformer.TransformRequestAsync(httpContext, proxyRequest, destinationPrefix, cancellationToken);

src/WebJobs.Script/Http/RetryProxyHandler.cs:67

  • Add tests to ensure RetryProxyHandler stops retrying when a WorkerShutdownException is thrown, and that the correct log message is emitted.
                catch (WorkerShutdownException)

src/WebJobs.Script/Http/DefaultHttpProxyService.cs:108

  • Consider adding integration or unit tests to validate that the ScriptInvocationRequestTransformer is passed into SendAsync and that the context item is present on the forwarded request.
            var forwardingTask = _httpForwarder.SendAsync(httpContext, httpUri.ToString(), _messageInvoker, _forwarderRequestConfig, _httpTransformer).AsTask();

src/WebJobs.Script/Http/ScriptInvocationRequestTransformer.cs:13

  • [nitpick] This class is declared in the global namespace; consider placing it under Microsoft.Azure.WebJobs.Script.Http to match the existing project structure and improve discoverability.
public class ScriptInvocationRequestTransformer : HttpTransformer

{
if (workerException == null)
WorkerShutdownException shutdownException = new WorkerShutdownException("Worker encountered a fatal error and is shutting down.");
Copy link
Preview

Copilot AI Jul 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original workerException details are not preserved as an inner exception; consider passing workerException into the WorkerShutdownException constructor or using it as the inner exception to retain stack trace and context.

Suggested change
WorkerShutdownException shutdownException = new WorkerShutdownException("Worker encountered a fatal error and is shutting down.");
WorkerShutdownException shutdownException = workerException is not null
? new WorkerShutdownException("Worker encountered a fatal error and is shutting down.", workerException)
: new WorkerShutdownException("Worker encountered a fatal error and is shutting down.");

Copilot uses AI. Check for mistakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Worker restart due to function timeout does not fail current executions
2 participants