-
Notifications
You must be signed in to change notification settings - Fork 461
Fail in-flight invocations when worker channel shuts down #11159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
[InlineData(3, true)] | ||
[InlineData(1, false)] | ||
[InlineData(3, false)] | ||
public async Task Shutdown_FailsInFlightInvocations(int numberOfInvocations, bool hasFailureException) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prior to this change, calling TryFailExecutions
(replaced by Shutdown
) would not fail executions without an exception being passed in. This test makes sure that in-flight invocations are failed whether or not an exception is passed in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR modifies the worker channel shutdown behavior to better handle in-flight invocations when a worker channel encounters a fatal error. The key change is replacing the TryFailExecutions
method with a more descriptive Shutdown
method that properly fails in-flight invocations with a specific exception type.
Key changes:
- Replace
TryFailExecutions
method withShutdown
method across all worker channel implementations - Introduce
WorkerShutdownException
to provide more specific error information for failed invocations - Update HTTP proxy service to properly handle worker shutdown scenarios during request retries
Reviewed Changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 3 comments.
Show a summary per file
File | Description |
---|---|
src/WebJobs.Script/Workers/Rpc/IRpcWorkerChannel.cs |
Updates interface to replace TryFailExecutions with Shutdown method |
src/WebJobs.Script.Grpc/Channel/GrpcWorkerChannel.cs |
Implements new Shutdown method with WorkerShutdownException |
src/WebJobs.Script/Exceptions/WorkerShutdownException.cs |
Adds new exception type for worker shutdown scenarios |
src/WebJobs.Script/Http/RetryProxyHandler.cs |
Updates HTTP retry logic to handle worker shutdown exceptions |
src/WebJobs.Script/Http/ScriptInvocationRequestTransformer.cs |
Adds transformer to pass script invocation context through HTTP proxy |
src/WebJobs.Script/Http/DefaultHttpProxyService.cs |
Integrates new transformer and adds invocation context to HTTP requests |
test/WebJobs.Script.Tests/Handlers/WebScriptHostExceptionHandlerTests.cs
Show resolved
Hide resolved
@@ -29,6 +29,13 @@ public async Task OnTimeoutExceptionAsync(ExceptionDispatchInfo exceptionInfo, T | |||
{ | |||
FunctionTimeoutException timeoutException = exceptionInfo.SourceException as FunctionTimeoutException; | |||
|
|||
// this seems to happen when the worker channel is already shutting down. Ex. One timeout is being handled and another comes in during shutdown. | |||
if (timeoutException is null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's not FunctionTimeoutException
, what exception the caller is sending to this method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this seems odd, the method is OnTimeoutExceptionAsync
, but we are now checking if it was called for a non-timeout? Additionally, the log here just assumes that if timeoutException is null
, then the channel must be shutting down? Will that always be true? How future proof is that assumption?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've only reproduced this twice, but timeoutException has ended up null
in this method - still digging into why that is, but from my observation of the two events is as described in the comment. Let me continue to investigate this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main concern is the potential breaking change by switching to a WorkerShutdownException
. But maybe my assumption this exception makes it to extensions is wrong?
@@ -29,6 +29,13 @@ public async Task OnTimeoutExceptionAsync(ExceptionDispatchInfo exceptionInfo, T | |||
{ | |||
FunctionTimeoutException timeoutException = exceptionInfo.SourceException as FunctionTimeoutException; | |||
|
|||
// this seems to happen when the worker channel is already shutting down. Ex. One timeout is being handled and another comes in during shutdown. | |||
if (timeoutException is null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this seems odd, the method is OnTimeoutExceptionAsync
, but we are now checking if it was called for a non-timeout? Additionally, the log here just assumes that if timeoutException is null
, then the channel must be shutting down? Will that always be true? How future proof is that assumption?
The new approach: To avoid a breaking change with Durable, we will introduce an exception that derives from the existing FunctionTimeoutException, and looks like this: In these cases, in the Since durable already has handling for Edit: The new exception type is also being added to the WebJobs SDK (Azure/azure-webjobs-sdk#3139) so extensions can handle this type of failure with more granularity. |
Issue describing the changes in this PR
resolves #10936
Pull request checklist
IMPORTANT: Currently, changes must be backported to the
in-proc
branch to be included in Core Tools and non-Flex deployments.in-proc
branch is not requiredrelease_notes.md
Additional information
Additional PR information