-
Notifications
You must be signed in to change notification settings - Fork 280
Description
Description
This issue is specific to the Azure Storage backend.
If an orchestration completes, but the process hosting it recycles after the History table is updated but before the Instances table is updated, the orchestration will get permanently stuck in the "Running" state.
This issue can be detected by checking for a "Discarding work item" warning in the DurableTask-AzureStorage
logs with a reason of "Instance is Completed" (or Terminated, or Failed), even though the orchestration reports as being in the "Running" state.
This problem happens rarely, but we have seen recent cases of it, including in a recent support case (ICM 676943847). It's also been observed in #2364 and #3079. There currently isn't a great workaround beyond manually manipulating the data in Azure Storage.
The impact of this issue is that app workflows that depend on status changes for orchestrations will also get stuck waiting for the orchestration to transition to the Completed (or Terminated/Failed) state.
Expected behavior
Orchestrations should be able to run to completion in the face of any transient failures and have that completion be reflected in status queries.
Actual behavior
This is one case where the orchestration gets stuck in an inconsistent state.
Known workarounds
Manually update the status of the orchestration in the <TaskHubName>Instances
table to the correct status ("Completed", "Failed", or "Terminated"). This will cause status queries to return the proper status.