Skip to content

Completed orchestrations sometimes get permanently stuck in the Running status #3223

@cgillum

Description

@cgillum

Description

This issue is specific to the Azure Storage backend.

If an orchestration completes, but the process hosting it recycles after the History table is updated but before the Instances table is updated, the orchestration will get permanently stuck in the "Running" state.

This issue can be detected by checking for a "Discarding work item" warning in the DurableTask-AzureStorage logs with a reason of "Instance is Completed" (or Terminated, or Failed), even though the orchestration reports as being in the "Running" state.

This problem happens rarely, but we have seen recent cases of it, including in a recent support case (ICM 676943847). It's also been observed in #2364 and #3079. There currently isn't a great workaround beyond manually manipulating the data in Azure Storage.

The impact of this issue is that app workflows that depend on status changes for orchestrations will also get stuck waiting for the orchestration to transition to the Completed (or Terminated/Failed) state.

Expected behavior

Orchestrations should be able to run to completion in the face of any transient failures and have that completion be reflected in status queries.

Actual behavior

This is one case where the orchestration gets stuck in an inconsistent state.

Known workarounds

Manually update the status of the orchestration in the <TaskHubName>Instances table to the correct status ("Completed", "Failed", or "Terminated"). This will cause status queries to return the proper status.

Metadata

Metadata

Assignees

Labels

P1Priority 1

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions