Skip to content

Orchestration is stuck in the Running state #3022

Closed
Azure/durabletask
#1189
@andrey-malkov

Description

@andrey-malkov

I've already gone through the troubleshooting guide, but it doesn't provide a solution for our specific case.

Our workflow is designed to scan the backlog table and initiate sub-orchestrator workflows to process backlog items. To manage iterations efficiently, we use the eternal orchestration pattern with ContinueAsNew, preventing performance issues associated with infinite loops. Additionally, we introduce delays between iterations using the IDurableOrchestrationContext.CreateTimer method.

using var pullingJobCts = new CancellationTokenSource();
await context.CreateTimer(context.CurrentUtcDateTime.Add(TimeSpan.FromMinutes(1)), pullingJobCts.Token);

We've been experiencing this issue for quite some time. Over the past two weeks, out of thousands of executions, approximately a dozen workflow instances have become stuck. Using the VS Code extension, I can see that the last recorded operation for these stuck workflows is TimerCreated.

I was finally able to capture messages for these instances in the control queue

Instance ID				Message ID				Deqeue count
38bf0b8588db420e8c5b992d47a8e735:1705	b7bde336-9125-4859-8dd7-d259dd0d4204	4549 
58dbd6970a4e4d8895a03af0a8fa1fad:998	bba78a46-7222-483a-87e1-74f76d0e17df 	6944 

their dequeue count values suggest that after the delay, the expected message to resume computation was never received, since 17 and 22 Jan. I've attached the messages body. You can see the event type is TimerFiredEvent.

b7bde336-9125-4859-8dd7-d259dd0d4204.json
bba78a46-7222-483a-87e1-74f76d0e17df.json

In the logs I found errors

2025-01-31T21:45:32Z [Error] An unexpected failure occurred while processing instance '58dbd6970a4e4d8895a03af0a8fa1fad:998': DurableTask.AzureStorage.Storage.DurableTaskStorageException: An error occurred while communicating with Azure Storage
---> Azure.RequestFailedException: The specified blob does not exist.

error-38bf0b8588db420e8c5b992d47a8e735-1705.txt
error-58dbd6970a4e4d8895a03af0a8fa1fad-998.txt

Thanks,
Andrei

Metadata

Metadata

Assignees

Labels

P1Priority 1bug

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions